77
N. Calzolari 1 Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa [email protected] Risorse Linguistiche Risorse Linguistiche (lessici, corpora, ontologie, (lessici, corpora, ontologie, …) …) Standard e tecnologie Standard e tecnologie linguistiche linguistiche With many others at ILC

N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa [email protected] Risorse Linguistiche

Embed Size (px)

Citation preview

Page 1: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 1Dottorato, Pisa, Maggio 2009

Nicoletta Calzolari Nicoletta Calzolari

Istituto di Linguistica Computazionale - CNR - Pisa

[email protected]

Risorse Linguistiche Risorse Linguistiche

(lessici, corpora, ontologie, …) (lessici, corpora, ontologie, …)

Standard e tecnologie Standard e tecnologie

linguistiche linguistiche

With many others at ILC

Page 2: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 2Dottorato, Pisa, Maggio 2009

1) 1) Because the main trend until Because the main trend until mid-’80smid-’80s was to privilege was to privilege the processing of the processing of “critical” phenomena“critical” phenomena,, studied by the studied by the dominating linguistic theories, rather than focusing on the dominating linguistic theories, rather than focusing on the deep analysis of the real uses of a languagedeep analysis of the real uses of a language As a result CL was focusing on: As a result CL was focusing on:

few examples - often artificially built lexicons made of few entries (toy lexicons) grammars with poor coverage

2)2) Because large-scale LRs are Because large-scale LRs are costlycostly & their production & their production requires a big organizing effortrequires a big organizing effort

WhyWhy such needed LRs, such needed LRs, were were lackinglacking

after 30 years of R&D in the field?after 30 years of R&D in the field?

Old slide with Antonio Zampolli (’80s/early ‘90s)

WhyWhy we we stillstill lack them?? lack them??

Page 3: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 3Dottorato, Pisa, Maggio 2009

Early interest:Early interest: To become machine-tractable To extract info from them – with much less powerful tools than now Precursor of the trend of automatic acquisition from corpora

Acquilex (Pisa et al.)Work on/with Longman dictionary (Las Cruces)

NSF & EC International Cooperation grant, NSF & EC International Cooperation grant, promoted by Wilks, Zampolli, Calzolari (Las Cruces & Pisa)

Don Walker Don Walker & &

Antonio ZampolliAntonio Zampolli

Work on Machine Readable Dictionaries:Work on Machine Readable Dictionaries:

The The beginnings…beginnings… After many years of complete disregard – or even disdain

and contempt – for LRs, due mainly to the prevalence and influence of the generativist school

Pioneering Pioneering ResearchResearch

Historical notes

Page 4: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 4Dottorato, Pisa, Maggio 2009

… … back from the ’70s/‘80sback from the ’70s/‘80s

It became evident that:It became evident that:

Part of the results of meaning extractionPart of the results of meaning extraction, e.g. many meaning , e.g. many meaning distinctions, which could be generalised over lexicographic definitions distinctions, which could be generalised over lexicographic definitions and automatically captured, and automatically captured,

were were unmanageable at the formal representation levelunmanageable at the formal representation level, and had , and had to be blurred into unique features and values. to be blurred into unique features and values.

Unfortunately, it is Unfortunately, it is still todaystill today difficult to constrain word-meanings difficult to constrain word-meanings within a rigorously defined organizationwithin a rigorously defined organization: by their very nature they : by their very nature they tend to evade any strict boundariestend to evade any strict boundaries

Automatic acquisition of lexical information Automatic acquisition of lexical information from MRDsfrom MRDs

Was at the centre of activities in Was at the centre of activities in Pisa Pisa groupgroup, Amsler, Briscoe, , Amsler, Briscoe, Boguraev, WilksBoguraev, Wilks’ group, ’ group, IBMIBM, then , then JapaneseJapanese groups, … groups, …The trend was: “The trend was: “large-scale computational methods for the large-scale computational methods for the transformation of machine readable dictionaries (MRDs) into transformation of machine readable dictionaries (MRDs) into machine tractable dictionariesmachine tractable dictionaries””

Page 5: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 5Dottorato, Pisa, Maggio 2009

The lexicon has become ever more relevant

Both international and national authorities started investing in the field as never before, interested in technologies & systems which are really working and are economically interesting

The need of empirical methods, based on the analysis of large amount of data, has been recognized

LRs must be robust enough for analysing the concrete uses of a language, either theoretically “interesting” or not

After that pioneering era, production & use of adequate LRs strongly increased

Data-Data-driven driven

approacheapproachess

Page 6: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 6Dottorato, Pisa, Maggio 2009

LRs have acquired larger resonance in the last 2 decades, when many activities, in Europe and world-wide, have contributed to substantial advances in knowledge and capability of how to represent, create,

acquire, access, exploit, harmonise, tune, maintain, distribute, etc. large lexical and textual repositories

In Europe an essential role was played by the EC, through initiatives NERCNERCPAROLEPAROLESIMPLESIMPLEEuroWordNetEuroWordNetEAGLESEAGLESISLEISLEELSNETELSNETRELATORRELATOR……

that saw the participation of many EU groups, linked over the years by sharing common approaches and visions

Since then …Since then …

Page 7: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 7Dottorato, Pisa, Maggio 2009

Automatic acquisition of infoAutomatic acquisition of info

from texts: from texts:

This trendThis trend has become has become today a consolidated today a consolidated

factfact, and we have moved , and we have moved

from focusing on acquisition of from focusing on acquisition of “linguistic “linguistic

information”information” (as at the beginning) (as at the beginning)

to broader acquisition of to broader acquisition of “general knowledge”,“general knowledge”,

with more data intensive, robust, reliable with more data intensive, robust, reliable

methodsmethods

… … back from the late ‘80sback from the late ‘80s

After acquisition from MRDs,After acquisition from MRDs,

Page 8: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 8Dottorato, Pisa, Maggio 2009

LRs give to NLP systems the knowledge needed for the various linguistic LRs give to NLP systems the knowledge needed for the various linguistic

processingprocessing

Realising that most of the needed information Realising that most of the needed information escapesescapes individual “ individual “introspectionintrospection”” can only be can only be acquiredacquired analysing large textual analysing large textual corpora corpora attesting language use attesting language use

in different fields/communicative contextsin different fields/communicative contexts

BUT need of adequate modelsneed of adequate models to handle actual usage of language

LRs as necessary infrastructure (Lexicons/Corpora)

both for research & applications:

Sub-product?:Sub-product?: Importance of Importance of statisticalstatistical methods methods

Lesson: Going from core sets to large coverageto large coverage has implications

not just in quantitative terms, but more interestingly in terms of changes to the models changes to the models and the strategies of processes

We started We started building:building:

Page 9: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 9Dottorato, Pisa, Maggio 2009

What are we (LT& LR) What are we (LT& LR) assembling, …. since many assembling, …. since many years?years? Lexicons & their OntologiesLexicons & their Ontologies

Written, Spoken, ItalWordNets, PAROLE/SIMPLE, Written, Spoken, ItalWordNets, PAROLE/SIMPLE, FrameNets, …FrameNets, …

Annotated corpora/TreebanksAnnotated corpora/Treebanks Basic ToolsBasic Tools

Integrated Architecture for Integrated Architecture for Annotation at various levels (from morph. to Annotation at various levels (from morph. to

conceptual)conceptual) Acquisition/learningAcquisition/learning Classification Classification Ontology creationOntology creation ……

MethodologiesMethodologies Know-how Know-how & expertise& expertise Infrastructural bodies Infrastructural bodies (on which to build)

Standards

… … components components of a very of a very large large infrastructurinfrastructure of LRs & LTe of LRs & LT

Page 10: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 10Dottorato, Pisa, Maggio 2009

History:History: Some international LRs Some international LRs initiativesinitiatives

ACQUILEX ACQUILEX [[since since ’88’88]]

MULTILEXMULTILEX ET-7ET-7 ET-10ET-10 TEITEI NERCNERC RELATORRELATOR ONOMASTICAONOMASTICA MULTEXTMULTEXT COLSITCOLSIT LSGRAMLSGRAM DELISDELIS EAGLESEAGLES PAROLEPAROLE SIMPLESIMPLE SPARKLESPARKLE ELSNETELSNET EuroWordNetEuroWordNet

MATEMATE NITENITE Cluster 488 Cluster 488

(Italian)(Italian) TAL TAL (Italian)(Italian) ISLEISLE ENABLERENABLER INTERAINTERA LIRICSLIRICS …… Senseval/Senseval/

SemevalSemeval WRITEWRITE Forum TAL Forum TAL

(Italian)(Italian) …… ISOISO ELRAELRA LRECLREC LRE JournalLRE Journal NEDONEDO Language GridLanguage Grid BootStrepBootStrep KYOTOKYOTO ……

Essential role of ECEssential role of ECto start a basic to start a basic InfrastructureInfrastructure

EU at the EU at the forefront in the forefront in the

areas of LRs areas of LRs and standards and standards

in the ’90sin the ’90s

Established a modelEstablished a model

Page 11: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 11Dottorato, Pisa, Maggio 2009

Today: a broad “potential” Today: a broad “potential” InfrastructureInfrastructure

RELATORRELATOREAGLES/ISLEEAGLES/ISLEENABLERENABLER ELSNETELSNETTELRITELRIINTERAINTERALIRICSLIRICS……ELRAELRA

BLARKBLARKUnified Lexicon (W/S)Unified Lexicon (W/S)

LRECLRECLRE journalLRE journal……ERANET-LangNetERANET-LangNet……

LDC LDC & others& othersISO ISO COCOSDA/WRITECOCOSDA/WRITE US US

CyberinfrastructurCyberinfrastructuree

Japan COE21Japan COE21NEDONEDOLanguage Grid Language Grid ……

EUEU InternatInternat

National National

………………

Cooperative

Cooperative

initiatives –

initiatives –

Links to…Links to…

FLaReNet FLaReNet (ICT)(ICT)CLARIN CLARIN (ESFRI)(ESFRI)

Vitality &Vitality & Success signs… for LRsSuccess signs… for LRs

Page 12: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 12Dottorato, Pisa, Maggio 2009

{{Casa,abitazione,dimoraCasa,abitazione,dimora}}

HyperonymHyperonym:: {edificio,..}

Hyponym:Hyponym:{villetta }{catapecchia, bicocca, .. }{cottage}{bungalow }

Role_location: {stare, abitare, ...}

Role_target_direction: {rincasare}

Role_patient: {affitto, locazione}

Mero_part: {vestibolo}

{stanza}Holo_part: {casale} {frazione} {caseggiato}

{{home,domicile,..}}{{house}}

TOP ConceptsTOP Concepts: Object,Artifact,BuildingObject,Artifact,Building

WordNetsWordNetsSynsets linked by semantic relationsSynsets linked by semantic relations

Page 13: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 13Dottorato, Pisa, Maggio 2009

ItalWordNet ItalWordNet Semantic NetworkSemantic Network

[Italian module of EuroWordNetEuroWordNet]

~ 55.00055.000 lemmas organized in synonym groupssynonym groups (synsetssynsets), structured in

hierarchieshierarchies & linked by ~ 130.000130.000 semantic relations

~ ~ 55.000 hyperonymy/hyponymy relations~ 16.000 relations among different POS (role, cause, derivation, etc..)~ 2.000 part-whole relations~ 1.500 antonymy relations, …etc.

Synsets linked to the InterLingual Index linked to the InterLingual Index (ILI=Princeton WordNet),

Through the ILIILI link to all the European European WordNetsWordNets (de-facto standard) & to the common Top OntologyTop Ontology

• Usable in IR, CLIR, IE, QA, ...

Possibility of plug-in withplug-in with domain terminological lexiconsdomain terminological lexicons

(legal, maritime, … linguistic… linguistic)

Page 14: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 14Dottorato, Pisa, Maggio 2009

skinhairbody-covering

Top

1stOrderEntity 2ndOrderEntity

SituationType SituationComponent

Living

Location ExperiencePhysicalStatic DynamicNaturalCovering Part Group

Composition OriginFunction Form

Etc….Etc.

bodypartcellmuscleorgan

Object

Human

Mental

Directiondistancespatial propertyspatial relationcoursepath

change of positiondividelocomotionmotion

feeldesiredisturbanceemotionfeelinghumorpleasance

churchcompanyinstituteorganizationpartyunion

humanadultadult femaleadult malechildnativeoffspring

ItalWordNet: ItalWordNet: Clusters of “Base Concepts” Clusters of “Base Concepts”

classified according to Ontology Top Conceptsclassified according to Ontology Top Concepts= words= words

= features= featuresLexicon or ontology

???

Page 15: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 15Dottorato, Pisa, Maggio 2009

1stOrderEntity1stOrderEntity

OriginOriginNaturalNatural

LivingLivingPlantPlantHumanHumanCreatureCreatureAnimalAnimal

ArtifactArtifactFormForm

SubstanceSubstance SolidSolidLiquidLiquidGasGas

Object1Object1CompositionComposition

PartPartGroupGroup

FunctionFunctionVehicleVehicleRepresentationRepresentation

MoneyRepresentationMoneyRepresentationLanguageRepresentationLanguageRepresentationImageRepresentationImageRepresentation

SoftwareSoftwarePlacePlaceOccupationOccupationInstrumentInstrumentGarmentGarmentFurnitureFurnitureCoveringCoveringContainerContainerComestibleComestibleBuildingBuilding

2ndOrderEntity2ndOrderEntity

SituationTypeSituationTypeDynamicDynamic

BoundedEventBoundedEvent

UnboundedEventUnboundedEventStaticStatic

PropertyPropertyRelationRelation

SituationComponentSituationComponentCauseCause

AgentiveAgentivePhenomenalPhenomenalStimulatingStimulating

CommunicationCommunicationConditionConditionExistenceExistenceExperienceExperienceLocationLocationMannerMannerMentalMentalModalModalPhysicalPhysicalPossessionPossessionPurposePurposeQuantityQuantitySocialSocialTimeTimeUsageUsage

3rdOrderEntity3rdOrderEntity

EWNEWNTop-Top-OntologyOntology

ItalWordItalWordNetNet

Page 16: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 17Dottorato, Pisa, Maggio 2009

hond

dog

cane

perro

dog Italian WN

TOP ONTOLOGY

Spanish WN

Dutch WN

English WN

ANIMAL

ILI

LIVING

HUMAN

French WN German

WN

Estonian WN

Czech WN

EuroWordNet EuroWordNet Multilingual Data StructureMultilingual Data Structure

EnglishEnglishEnglishEnglish

……

……

Page 17: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 18Dottorato, Pisa, Maggio 2009

Terminological Wordnets: Terminological Wordnets:

e.g. e.g. JurJur--WordNetWordNet

JurJur-WordNet-WordNet EExtension for the xtension for the juridical domainjuridical domain

of ItalWordNet of ItalWordNet (With ITTIG-CNR - Istituto di Teoria e Tecniche dell’Informazione Giuridica)(With ITTIG-CNR - Istituto di Teoria e Tecniche dell’Informazione Giuridica)

Knowledge base for multilingual access to sources of legal Knowledge base for multilingual access to sources of legal informationinformation

Source of metadata for semantic markup oflegal textsSource of metadata for semantic markup oflegal texts

To be used, together with the generic ItalWordNet, in To be used, together with the generic ItalWordNet, in applications of Information Extraction, Question Answering, applications of Information Extraction, Question Answering, Automatic Tagging, Knowledge Sharing, Norm Comparison, Automatic Tagging, Knowledge Sharing, Norm Comparison, etc.etc.

Page 18: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 19Dottorato, Pisa, Maggio 2009

Terminological Lexicon of Terminological Lexicon of NavigationNavigation

NoloNolo

Synset Synset 1.614 1.614Lemmas Lemmas

2.1162.116Senses Senses 2.232 2.232Nouns Nouns 1.621 1.621Verbs Verbs 205 205Adjectives Adjectives 35 35Proper Nouns Proper Nouns

236236

Page 19: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 20Dottorato, Pisa, Maggio 2009

SIMPLE Lexicon & OntologySIMPLE Lexicon & Ontology Multidimensional Type HierarchyMultidimensional Type Hierarchy

Shared by Shared by 1212 European languagesEuropean languages Theoretical background: Theoretical background: Generative LexiconGenerative Lexicon

(Pustejovsky)(Pustejovsky)

157 language independent SIMPLE semantic 157 language independent SIMPLE semantic types:types: Based on Based on hierarchical & non-hierarch. conceptual relationshierarchical & non-hierarch. conceptual relations

Difference of internal complexity:Difference of internal complexity:

Simple types Simple types (one-dimensional) characterised in terms of (one-dimensional) characterised in terms of hyperonymic relationshyperonymic relations

Unified typesUnified types ( (multi-dimensionalmulti-dimensional) only definable through the ) only definable through the combination of:combination of:

the relation to their supertype +the relation to their supertype + the reference to the reference to orthogonal dimensions of meanings orthogonal dimensions of meanings

(through the Qualia(through the Qualia Structure) Structure)

http://www.ilc.cnr.it/clips/CLIPS_ENGLISH.htm

Page 20: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 21Dottorato, Pisa, Maggio 2009

PAROLE- SIMPLE-CLIPS Lexicon: PAROLE- SIMPLE-CLIPS Lexicon: …harmonised model for 12 European …harmonised model for 12 European

languageslanguages

Page 21: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 22Dottorato, Pisa, Maggio 2009

SemUSemU Predicate, arguments, Predicate, arguments, Selection restrictionsSelection restrictions

Pred. LayerPred. Layer

QualiaQualia DerivationDerivation PolysemyPolysemy Event TypeEvent Type

InstantiationInstantiation

Italian lexiconItalian lexicon

Type Type OntologyOntology

150 types150 types

TemplateTemplate Catalan lexiconCatalan lexicon

Danish lexiconDanish lexicon

Greek lexiconGreek lexicon

Overall Overall OrganizationOrganization

......

Page 22: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 23Dottorato, Pisa, Maggio 2009

Model Architecture Model Architecture The first three levels : Information contentThe first three levels : Information content

Phonological Unit

Phonological Unit

stress positionvowel opennesscons. prononciation

PoS (& PoS subcategory) inflectional paradigm Morphological

UnitMorphological

Unit

position list position restr.

position list position restr. a. head properties

b. subcat. frame

Corresp. PhnU-MrphU

Corresp. MrphU-SynU

Syntactic Unit

Syntactic Unit

Synt. Struct

Synt. Struct 2

Frameset1

a. head properties b. subcat. frame

syntacticargument

syntacticbehaviour

Page 23: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 24Dottorato, Pisa, Maggio 2009

Semantic Unit

arguments:

sem. role; sem. restr.

lexical predicate

Semantic properties

Ontological type

Domain

Event Type

Extended Qualia Structure

Synonymy

Regular Polysemy alt.

Derivation

Predicative Representation

Link to syntactic unit

F

E

A

T

U

R

E

S

R

E

L

A

T

I

O

N

S

A

M

O

N

G

S

E

M

U

S

The semantic level:The semantic level: Information types Information types

Page 24: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 25Dottorato, Pisa, Maggio 2009

Aumento Aumento (Increase):(Increase):

• Semantic type: Cause_change_of_value

• Gloss: accrescimento in dimensione o quantità

• Agentivecause: yes

L’aumento dei prezzi di un venti%L’aumento dei prezzi di un venti%

• Supertype: Cause_relational_change

• Eventype: transition• Domain: general, economics

• aumento Isa cambiamento

• aumento resulting_state maggiore

• Direction: up

• Morphological derivation: Eventverb aumentare

• Semantic predicate: PRED_aumentare; 3 arguments

• Type of link: event nominalization

• Arguments description: range, semantic role & selectional restriction:

Arg0

Protoagent

Human / Institution

Arg1

ProtoPatient

Entity

Arg2

Quantifier

Amount

SEMANTIC ENTRY CONTENTSEMANTIC ENTRY CONTENTSEMANTIC ENTRY CONTENTSEMANTIC ENTRY CONTENT

ONTOLOGICAL INFO.ONTOLOGICAL INFO.

EXTENDED QUALIA INFO.EXTENDED QUALIA INFO.

PREDICATIVE REPRESENTATIONPREDICATIVE REPRESENTATION

Page 25: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 26Dottorato, Pisa, Maggio 2009

Semantic entry

ontological type

event type

domain information

qualia features

Extended Qualia Structure

regular polysemy

predicative representation

semantic type: Instrumentunification_path: [Concrete_entity | ArtifactAgentive | Telic]

semantic type: Instrumentunification_path: [Concrete_entity | ArtifactAgentive | Telic]

eventype: =====eventype: =====

cleaning, gardening, cosmeticscleaning, gardening, cosmetics

==========

USem3527vaporizzatore isa Usem3479apparecchioUSem3527vaporizzatore has_as_part Usem61633pulsanteUSem3527vaporizzatore created_by UsemD387fabbricareUSem3527vaporizzatore used_for UsemD66019nebulizzare

USem3527vaporizzatore isa Usem3479apparecchioUSem3527vaporizzatore has_as_part Usem61633pulsanteUSem3527vaporizzatore created_by UsemD387fabbricareUSem3527vaporizzatore used_for UsemD66019nebulizzare

regular polysemy: =====regular polysemy: =====

USem3527vaporizzatore

free definitionapparecchio usato per vaporizzare

apparecchio usato per vaporizzare

exampleun vaporizzatore per piante

un vaporizzatore per piante

semantic relations

USem3527vaporizzatore synonymy USem72288nebulizzatoreUSem3527vaporizzatore instrumentverb Usem5239vaporizzare

USem3527vaporizzatore synonymy USem72288nebulizzatoreUSem3527vaporizzatore instrumentverb Usem5239vaporizzare

semantic predicate: PRED_vaporizzare-1type of link: instrument nominalization

arguments description: • range • semantic role • select. restrictions

semantic predicate: PRED_vaporizzare-1type of link: instrument nominalization

arguments description: • range • semantic role • select. restrictions

arg0_vaporizzare_1Protoagent

Human/Instrument

arg0_vaporizzare_1Protoagent

Human/Instrument

arg1_vaporizzare_1Protopatient

+liquid

arg1_vaporizzare_1Protopatient

+liquid

arg2_vaporizzare_1Location

Concrete_entity

arg2_vaporizzare_1Location

Concrete_entity

from Nilda Ruimy

Page 26: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 27Dottorato, Pisa, Maggio 2009

Semantic entry

ontological type ontological type

event type

domain information

qualia features

Extended Qualia Structure

regular polysemy

predicative representation

semantic type: Cause_change_of_statesupertype: Cause_relational_change

semantic type: Cause_change_of_statesupertype: Cause_relational_change

eventype: transitioneventype: transition

biomedicinebiomedicine

agentive_cause: yesresulting_state: yes

agentive_cause: yesresulting_state: yes

formal: Usem79678regulate isa Usem64875processconstitutive: =====agentive: =====telic: =====

formal: Usem79678regulate isa Usem64875processconstitutive: =====agentive: =====telic: =====

regular polysemy: =====regular polysemy: =====

semantic predicate: PRED_regulate-1type of link: master

arguments description: • range • semantic role • select. restrictions

semantic predicate: PRED_regulate-1type of link: master

arguments description: • range • semantic role • select. restrictions

arg0_regulate_1Protoagent

Natural_Substance

arg0_regulate_1Protoagent

Natural_Substance

arg1_regulate_1Protopatient

Natural_Substance

arg1_regulate_1Protopatient

Natural_Substance

USem79678regulate

free definitionfree definitionregulation of a function or a physiological process

regulation of a function or a physiological process

exampleexample IL2 negatively regulates IL7IL2 negatively regulates IL7

semantic relations

synonymy: =====morpho. derivation: =====

synonymy: =====morpho. derivation: =====

from Nilda Ruimy

Page 27: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 28Dottorato, Pisa, Maggio 2009

Semantic entry

ontological type

event type

domain information

qualia features

Extended Qualia Structure

regular polysemy

predicative representation

semantic type: Diseaseunification_path: [Phenomenon | Agentive]

semantic type: Diseaseunification_path: [Phenomenon | Agentive]

eventype: =====eventype: =====

Ear-Nose-ThroatEar-Nose-Throat

agentive_cause: yesagentive_cause: yes

USemTH31676parotite isa USem3868malattiaUSemTH31676parotite affects USem1788ghiandolaUSemTH31676parotite causes Usem72131gonfioreUSemTH31676parotite caused_by USem1971virusUSemTH31676parotite typical_of USem3593bambino

USemTH31676parotite isa USem3868malattiaUSemTH31676parotite affects USem1788ghiandolaUSemTH31676parotite causes Usem72131gonfioreUSemTH31676parotite caused_by USem1971virusUSemTH31676parotite typical_of USem3593bambino

regular polysemy: =====regular polysemy: =====

UsemTH31676parotite

free definitionInfiammazione delle ghiandole parotidi

Infiammazione delle ghiandole parotidi

example il bambino ha una parotiteil bambino ha una parotite

semantic relations

USemTH31676parotite synonymy USem79528orecchioneUSemTH31676parotite synonymy USem79528orecchione

==========

from Nilda Ruimy

Page 28: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 29Dottorato, Pisa, Maggio 2009

SYNU_regulateV

Syntactic entry

verbauxiliary: havepassivization: +

verbauxiliary: havepassivization: +

P0 : subject mandatory NP

P0 : subject mandatory NP

head properties

subcategorization frameP1 : object mandatory NP

P1 : object mandatory NP

NF-AT positively regulates IL2, which negatively regulates IL7

USem79678regulate

USem79678regulate link to Semantic Unit

syntacticarguments

from Nilda Ruimy

Page 29: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 30Dottorato, Pisa, Maggio 2009

domain

semant. class

a. head properties

b. subcat. frame

positionsynt. restr.

syntactic structure 1

ontological type Corresp. SynU-SemU

event type

semant. features

semant. relations

Extended Qualia Structure

regular polysemysem. restr.

argumentspredicate predicative represent.

Corresp. Syntax-Semantics

type of link

SemanticUnit

synonymy

derivation

constitutive role

formal role

telic role

agentive role

syntactic structure 2

positionsynt. restr. Frameseta. head properties

b. subcat. frame SyntacticUnit

Syntax-semantics mapping Syntax-semantics mapping (1)(1)

Syntax-semantics mapping Syntax-semantics mapping (1)(1)

from Nilda Ruimy

Page 30: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 31Dottorato, Pisa, Maggio 2009

P0 : subject mandatory NP

P0 : subject mandatory NP

subcategorization frameid: np-v-np

P1 : object mandatory NP

P1 : object mandatory NP

predicative representation semantic predicate: PRED_regulate-1type of link: master

semantic arguments description: • range • semantic role • select. restrictions

semantic predicate: PRED_regulate-1type of link: master

semantic arguments description: • range • semantic role • select. restrictions

arg0_regulate_1Protoagent

Natural_Substance

arg0_regulate_1Protoagent

Natural_Substance

arg1_regulate_1 ProtopatientNatural_Substance

arg1_regulate_1 ProtopatientNatural_Substance

syntacticarguments

Regulate:

Syntax-Semantics mapping

S

Y

N

T

A

X

S

E

M

A

N

T

I

C

S

<Correspondence id="ISObivalent" correspargposl="ARG0-P0 ARG1-P1 "> </Correspondence>

synsemcorrespondence

from Nilda Ruimy

Page 31: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 32Dottorato, Pisa, Maggio 2009

PRED_ aumentare_1

ARG0 : AgentEntity

ARG1 : Patient Entity

ARG2 : Undersc.Amount

SynU_aumentare_V

Transitive structure

P0 P1 P2

Intransitive structure

P0 P1Frameset

SYNTACTIC LEVEL

SEMANTIC LEVEL

SemU2_aumentareSem.Type: CHANGE_OF_VALUE

SemU1_aumentareSem.Type: CAUSE_CHANGE_OF_VALUE

‘to increase’

SEMANTIC PREDICATE

LINK PREDICATE-SEMANTIC UNIT

SYNTAX-SEMANTIC MAPPINGSYNTAX-SEMANTIC MAPPING

from N. Ruimy

Page 32: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 33Dottorato, Pisa, Maggio 2009

PRED_ aumentare

ARG0 : Agent ARG1 : Patient

SynU_aumentare_V

Transitive structure

P0 P1 P2

Intransitive structure

P0 P1Frameset

ARG2 : Undersc.

isomorphic correspondence non-isomorphic corresp.

SemU1_aumentare SemU2_aumentare

CHANGE_OF_VALUECAUSE_CHANGE_OF_VALUE

CORRESPONDENCE SYNTACTIC-SEMANTIC FRAME

SYNTAX-SEMANTIC MAPPINGSYNTAX-SEMANTIC MAPPING

<Correspondence id="ISOtrivalent" correspargposl="ARG0-P0 ARG1-P1 ARG2-

P2"> </Correspondence>

<Correspondence id="AUG2to3erg9" comment=" Augmented mapping from TWO Position description to THREE argument description. ARG0 not represented in syntax" correspargposl="ARG1-P0 ARG2-P1"></Correspondence>

from N. Ruimy

Page 33: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 34Dottorato, Pisa, Maggio 2009

SemU

SellSell V V

SemU

SaleSale N N

SemU

SellerSeller N N

Pred_SELLPred_SELL <ARG0>, <ARG1>,

<ARG2>, <ARG3>

Event_nounEvent_noun

Relations andRelations and PredicatesPredicates

Is_the_agent_ofIs_the_agent_of

Page 34: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 35Dottorato, Pisa, Maggio 2009

PRED_ACCUSARE<ARG0>, <ARG1>,

<ARG2>,

accusareaccusare

accusatoaccusatorere

accusaaccusa

mastermaster

agent agent nominalisationnominalisation

process process nominalisationnominalisation

accusatoaccusato

patient patient nominalisationnominalisation

““Predicate - semantic unit(s)” Predicate - semantic unit(s)” linklink

& & RelationsRelations

““Predicate - semantic unit(s)” Predicate - semantic unit(s)” linklink

& & RelationsRelations

to accuseaccusation

accusatoraccused

from Nilda Ruimy

Is_the_agent_ofIs_the_agent_of

Event_nounEvent_noun

Page 35: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 36Dottorato, Pisa, Maggio 2009

The SIMPLE ontologyThe SIMPLE ontologyThe SIMPLE ontologyThe SIMPLE ontology

SimpleSimple Ontology: Ontology:

multidimensional type hierarchy based on bothmultidimensional type hierarchy based on both

hierarchical and non-hierarchical conceptual hierarchical and non-hierarchical conceptual relationsrelations

from Nilda Ruimy

In the SIMPLE ontology, types are not In the SIMPLE ontology, types are not mere labels but the mere labels but the repository of a repository of a specific set of structured semantic specific set of structured semantic informationinformation

Page 36: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 37Dottorato, Pisa, Maggio 2009

TELICAGENTIVECONSTITUTIVE ENTITY

CONCRETE_ENTITY ABSTRACT_ENTITYPROPERTY REPRESENTATION EVENTCAUSE

TOP

•Location

•Material

•Artifact

•Food

•Physical Object

•Organic Object

•Living Entity

•Substance

•PART

•GROUP

•AMOUNT

•Quality

•Psych Property

•Physi Property

•Social Property

•Domain

•Time

•Moral Standards

•Cognitive Fact

•Mvmt of Thought

•Institution

•Convention

•Abstract Location

•Language

•Sign

•Information

•Number

•Unit of measure

•Metalanguage•Human

•Animal

•Vegetal Entity

•Artifact Material

•Furniture

•Clothing

•Container

•Artwork

•Instrument

•Money

•Vehicle

•Semiotic Artifact

Aspectual

Cause Aspect.

Phenomenon

•Weather verbs

•Disease

•Stimuli

State

•Exist

•Rel. State

Act

•Non Rel. Act

•Relational Act

•Move

•Cause Act

•Speech Act

Psychological_event

•Cognitive Event

•Experience Event

Change

•Rel. Change

•Change Possession

•Change Location

•Natural Transition

•Acquire Knowledge

Cause_change

•Cause Rel. Change

•Cause Change Location

•Cause Natural Transition

•Creation

•Give Knowledge

from Nilda Ruimy

The SIMPLE ontologyThe SIMPLE ontologyM

ultid

imen

sion

ality

Page 37: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 38Dottorato, Pisa, Maggio 2009

SemU: Identifier of the Semantic Unit Related SynU: Identifier of the Syntactic Unit the SemU is related to IWN Base Concept Number of the corresponding ItalWordNet base concept Template_Type: [Container] Unification_path [Concrete_entity | ArtifactAgentive | Telic] Domain: General Semantic Class Link to the LexiQuest (or any other ontology) Gloss: Lexicographic gloss Predicative Representation

Predicate associated to the SemU and its argument structure [container_pred (arg0)]

Arg. Selectional Restrictions

Selectional restrictions (Arg0-HeadQuantified-Substance)

Derivation: Derivational relations between SemUs Qualia_Formal: isa (1, <container> or <hyperonym>) Qualia_Agentive: created_by (1, <Usem>: [CREATION]) //definitorial// Qualia_Constitutive: made_of (1, <Usem>) //optional//

has_as_part (1, <Usem>) //optional// contains (1, <Usem>)

Qualia_Telic: used_for (1, <contain>) //definitorial// used_for (1, <measure>) //optional//

Synonymy: Synonyms of the SemU //optional// Regular Polysemy: [Amount] [Container]

Ontology of Structured Semantic Ontology of Structured Semantic Types: Types:

a Templatea TemplateSchema Schema providing a providing a set of set of structured structured information information crucial to crucial to the the definition of definition of a semantic a semantic typetype

Interface Interface between between ontology & ontology & lexiconlexicon

Guide Guide for for the the lexicographelexicographerr

Page 38: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 39Dottorato, Pisa, Maggio 2009

Semantic typeSemantic type in the SIMPLE Ontology in the SIMPLE Ontology

Not just a label but rather a classificatory device consisting of a cluster of structured semantic information

distinguishing it by other senses of the same word

expressing its similarity with other words

Type assignment means endowing a word-sense with a structured set of semantic features and relations with a view to:

expressing its relationships to other words

drawing inferences from this information

Each semantic type is associated to a template, i.e. a schematic structure that contains a cluster of type-defining properties and imposes constraints on lexical items for type membership

Templates: interface between Ontology and Lexicon

Template-driven encoding methodology ensures internal and cross-lexicons consistency

from Nilda Ruimy

Page 39: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 40Dottorato, Pisa, Maggio 2009

ontologicalinformation

predicative representation

extendedqualia structure

Template for the sem. typeTemplate for the sem. type ‘Instrument’ ‘Instrument’

SemU: Identifier of a SemU

SynU: Identifier of the SynU to which the SemU is linked BC Number: Number of the corresponding Base Concept in

EuroWordNet Template_Type: Instrument Template_Supertype: Semantic type which dominates the type of the SemU in the

type-hierarchy

Unification_path: [Concrete_entity | ArtifactAgentive | Telic] Domain: Domain information Semantic Class: One of WordNet Classes Gloss: Lexicographic definition Event Type: Type of event (state, process, transition) Predicative Representation:

Predicate associated with the SemU, and its argument structure

Selectional Restr.: Selectional restrictions on the arguments Derivation: Derivational relations between SemUs Formal: Usem_1 isa Usem_2 [Artifact] Agentive: Usem_1 created_by Usem_2 [Creation] Constitutive: Usem_1 made_of Usem_2 [Substance] OPTIONAL

Usem_1 has_as_part Usem_2 [Artifact] OPTIONAL Telic: Usem_1 used_for Usem_2 [Event] Synonymy: Synonyms of the SemU

Collocates: Collocate information Complex: Polysemous class of the SemU

from Nilda Ruimy

Page 40: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 41Dottorato, Pisa, Maggio 2009

TopTop

FormalFormal ConstitutiveConstitutive AgentiveAgentive TelicTelic

Is_aIs_a Is_a_part_ofIs_a_part_of PropertyProperty

ContainsContains

Created_byCreated_by Agentive_causeAgentive_cause Indirect_telicIndirect_telic PurposePurpose

InstrumentalInstrumental

Is_the_habit_ofIs_the_habit_ofUsed_forUsed_for Used_asUsed_as

... ...

The targets of relations identify:

prototypical semantic information associated with a SemUprototypical semantic information associated with a SemU

elements of dictionary definitions of SemUselements of dictionary definitions of SemUs

typical corpus collocates of the SemUtypical corpus collocates of the SemU

100 Rels.100 Rels.

....

ActivityActivity.... ....

For a BioLexicon

For a BioLexicon

Page 41: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 42Dottorato, Pisa, Maggio 2009

Qualia StructureQualia Structure

Consists of four qualia roles encoding orthogonal dimensions of meaning :

formal role (general identification)

constitutive role (composition)

agentive role (origin)

telic role (function)

One of the four levels of semantic representation in the theory of Generative Lexicon

Page 42: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 43Dottorato, Pisa, Maggio 2009

isaantonym_compantonym_gradmult_opposition

result_ofagentive_progagentive_causeagentive_experiencecaused_bysource

AGENTIVE

ARTIFACTUAL

AGENTIVE

created_byderived_from

made_ofis_a_follower_ofhas_as_memberis_a_member_ofhas_as_partinstrumentkinshipis_a_part_ofresulting_staterelatesuses

CONSTITUTIVE

causesconcernsaffectsconstitutive_activitycontains has_as_colourhas_as_effecthas_as_propertymeasured_bymeasuresproducesproduced_by property_ofquantifiesrelated_tosuccessor_ofprecedestypical_ofcontainsfeeling

P

R

O

P

E

R

T

Y

is_inlives_intypical_location

LOCATION

Formal Constitutive Agentive Telicused_forused_asused_byused_against

TELIC

INSTRUMENTAL

DIRECT

TELIC

indirect_telicpurpose

object_of_activity

is_the_activity_ofis_the_ability_ofis_the_habit_of

ACTIVITY

ExtendedExtended Qualia Structure Qualia Structure

proiettile, colpire

bisturi, chirurgo

medico, curare

disgusto, provare

casa, costruire

mohair, capra pane, farina

senatore, senato

manubrio, bicicletta

projectile, hit

lancet, surgeon

doctor, cure

disgust, feel

house, build

mohair, goat bread, flour

senator, senate

handlebar, bicycle

regulatesis_regulated_by …..

Page 43: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 44Dottorato, Pisa, Maggio 2009

is_aantonym_compantonym_gradmult_opposition

result_ofagentive_progagentive_causeagentive_experiencecaused_bysourcecreated_byderived_from

AGENTIVE

ARTIFACTUAL

AGENTIVE

CONSTITUTIVE

P

R

O

P

E

R

T

Y

LOCATION

Formal Constitutive Agentive Telicused_forused_asused_byused_against

TELIC

INSTRUMENTAL

DIRECT

TELIC

indirect_telicpurpose

object_of_activity

is_the_activity_ofis_the_ability_ofis_the_habit_of

ACTIVITY

regulatesis_regulated_by …..

“Extended” Qualia

Structure

T-cell, Blood Stem Cell

Ribose, Nucleotide

Catalyze, Enzyme

NEW!

made_ofis_a_follower_ofhas_as_memberis_a_member_ofhas_as_partinstrumentkinshipis_a_part_ofresulting_staterelatesusescausesconcernsaffectsconstitutive_activitycontains has_as_colourhas_as_effecthas_as_propertymeasured_bymeasuresproducesproduced_by property_ofquantifiesrelated_tosuccessor_ofprecedestypical_offeeling is_inlives_intypical_location

Page 44: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 45Dottorato, Pisa, Maggio 2009

recipienterecipientedi legnodi legnofattofatto

che serve per la conservazione e il trasportoche serve per la conservazione e il trasporto

Formal: isa Constitutive: made_of

Agentive: created_by

Constitutive:contains

Telic:used_for

di doghe arcuate tenute unite da cerchi di ferrodi doghe arcuate tenute unite da cerchi di ferro

Constitutive: made_of

di liquidi, specialmente vinodi liquidi, specialmente vino

bottebottebottebottebarrel

traditional dictionary definition

Meaning dimensions expressed Meaning dimensions expressed by by

Qualia relationsQualia relations

Meaning dimensions expressed Meaning dimensions expressed by by

Qualia relationsQualia relations

from Nilda Ruimy

Page 45: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 46Dottorato, Pisa, Maggio 2009

volareused_for

used_for

aeroplano

part_of

uccellopart_ofedificio

part_of

Ala

SemU: 3232Type: [Part]Parte di aeroplano

SemU: 3268Type: [Part]Parte di edificio

SemU: D358Type: [Body_part]Organo degli uccelli

SemU: 3467Type: [Role]Ruolo nel gioco del calcio

giocatoreisa

agentive

fabbricareagentive

squadra

member_of

……by using Lexical Resources by using Lexical Resources

Multidimensional Knowledge Bases Multidimensional Knowledge Bases

Page 46: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 47Dottorato, Pisa, Maggio 2009

Semantic Semantic Multidimensionality Multidimensionality

& NLP& NLPNLP tasks (IE, WSD, NP Recognition, etc.) need to

access multidimensional aspects of word multidimensional aspects of word meaningmeaning:

Extended Qualia RelationsExtended Qualia RelationsIs_a_part_ofIs_a_part_of

Member_ofMember_of

TelicTelic

Made_ofMade_of

la pagina del libro (the page of the book)

il difensore della Juventus (Juventus fullback)

il suonatore di liuto (the lute player)

il tavolo di legno (the wooden table)

Page 47: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 48Dottorato, Pisa, Maggio 2009

duna di sabbia

bicchiere di birra

fetta di pane

made_of

is_a_part_of

contains

?

?

?

Nilda Ruimy

ONTOLOGY

……..

SUBSTANCE

ARTIFACTUAL_DRINK ……….

liquid

DisambiguationDisambiguation = = Interpretation of Interpretation of conceptual conceptual relations in contextrelations in context

from Nilda Ruimy

Page 48: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 49Dottorato, Pisa, Maggio 2009

mangiarmangiaree

Used_forUsed_forObject_of_thObject_of_the_e_aactivityaactivity

man

gia

rem

an

gia

re

man

gia

rem

an

gia

re

tavolatavola

FURNITUREFURNITURE

forchettaforchetta

posataposata

INSTRUMENTINSTRUMENT

ristoranteristorante

BUILDINGBUILDING

cucinare

cucinare

cuocere

cuocere

mestolomestolo

pentolapentola

CONTAINERCONTAINER

mangia

mangia

rere

friggere

friggere

friggitricefriggitrice

bollitorebollitore

bollire

bollire

pes

cepes

ce

pescierapesciera

Is_the_activity_of

Is_the_activity_of

cuococuoco

PROFESSIONPROFESSION

cucin

are

cucin

arem

angi

are

man

giar

e

man

giar

e

man

giar

em

angia

re

man

giar

e

man

gia

rem

angia

re

coniglioconiglio

carnecarne

melamela

carotacarota

arrostoarrosto

man

gia

rem

an

gia

re

ARTIFACT _FOODARTIFACT _FOOD

VEGETABLESVEGETABLES

FRUITFRUITFOODFOOD

SUBSTANCE_FOODSUBSTANCE_FOOD

+edible+edible

zuccherozucchero

alloroalloro

tartufotartufo

VEGETAL_ENTITYVEGETAL_ENTITY

FLAVOURINGFLAVOURING

NATURAL_SUBSTANCENATURAL_SUBSTANCE

AGENTIVEAGENTIVE

TELICTELIC

Created_byCreated_by

cucinarecucinare

cuocerecuocerearrostirearrostirebollirebollire

lessarelessarestufarestufare

friggere friggere rosolarerosolaregrigliaregrigliare

…………

Domain - Semantic classDomain - Semantic classDomain - Semantic classDomain - Semantic class

from Nilda Ruimy

Page 49: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 50Dottorato, Pisa, Maggio 2009

Noun Compounds/Complex Nominals …are Noun Compounds/Complex Nominals …are pervasivepervasive

There is a motivation in most N+N constructionThere is a motivation in most N+N construction:: the context provides itthe context provides it

The The FrameNetFrameNet ( (SIMPLESIMPLE) way) way appeal to appeal to specific frame structuresspecific frame structures ( (qualia qualia

structuresstructures) ) associated with the head nounassociated with the head noun, , determine from corpus attestations determine from corpus attestations which which

frame elementsframe elements ( (qualiaqualia) can get instantiated ) can get instantiated as a modifier wordas a modifier word

““container”:container”: complex nominals can specify:complex nominals can specify:• material material (aluminium c., glass c., …)(aluminium c., glass c., …)• contents contents (food c., trash c., …)(food c., trash c., …)• size size (3 quart c., …)(3 quart c., …)• function function (shipping c., storage c., …)(shipping c., storage c., …)• ......

Page 50: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 51Dottorato, Pisa, Maggio 2009

Noun Compounds/Complex NominalsNoun Compounds/Complex Nominals& multidimensional semantic approaches& multidimensional semantic approaches

a.a. FrameNetFrameNet

““ContainerContainer”” Frame Structure Frame Structure: : Frame ElementsFrame Elements:: Material:Material: aluminum container, glass c., metal c., tin c.aluminum container, glass c., metal c., tin c. Contents:Contents: food container, beverage c., trash c., water c., milk c., fuel c.food container, beverage c., trash c., water c., milk c., fuel c. Size:Size: 3 quart container3 quart container Function:Function: shipping container, storage c.shipping container, storage c.

b.b. SIMPLESIMPLE

Qualia RelationsQualia Relations of of ""containercontainer"" as as used in compounds: Constitutive:Constitutive: made_ofmade_of [MATERIAL] [MATERIAL] aluminum container, glass c., metal aluminum container, glass c., metal

c., tin c.c., tin c. Telic:Telic: containscontains [ENTITY] [ENTITY] food container, beverage c., trash c., water food container, beverage c., trash c., water

c., milk c., fuel c.c., milk c., fuel c. Constitutive:Constitutive:sizesize [QUANTITY] [QUANTITY] 3 quart container3 quart container Telic:Telic:is_used_foris_used_for [EVENT] [EVENT]shipping container, storage c.shipping container, storage c.

Page 51: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 52Dottorato, Pisa, Maggio 2009

E.g. E.g. knife (coltello)knife (coltello) triggers:triggers: aa “cutting frame” (FrameNet) “cutting frame” (FrameNet) specific (SIMPLE) dimensions of meaningspecific (SIMPLE) dimensions of meaning

SIMPLE Extended Qualia structureSIMPLE Extended Qualia structurefor the interpretation of the semantic relation betw. Ns for the interpretation of the semantic relation betw. Ns

(internal relational structure of MWE)(internal relational structure of MWE)

butcher’s knifebutcher’s knife (coltello (coltello dada macellaio) macellaio) TELIC TELIC (used_by)(used_by) Y [Human] Y [Human] PPdaPPda

plastic knifeplastic knife (coltello (coltello didi plastica) plastica) CONST CONST (made_of)(made_of) X [Material] X [Material] PPdiPPdi

table knifetable knife (coltello (coltello dada tavola) tavola) TELIC TELIC (used_in)(used_in) Z [Location]Z [Location] PPdaPPda

hunting knifehunting knife (coltello (coltello dada caccia) caccia) TELIC TELIC (used_in_activity)(used_in_activity) E[Activity] E[Activity] PpdaPpda

piatto piatto didi legno legno CONST CONST (made_of)(made_of) X X [Material] [Material] PPdiPPdipiatto piatto didi pasta pasta CONST CONST (contains)(contains) X X [Food][Food] PPdiPPdi

Complex NominalsComplex Nominals

PPPPdisambigdisambig..

PPPPdisambigdisambig..

Page 52: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 53Dottorato, Pisa, Maggio 2009

Deverbal nominalisation:Deverbal nominalisation:

o nounnoun murdermurder ( (uccisione, delitto, omicidiouccisione, delitto, omicidio (different sem. pref.(different sem. pref.)) PPdiPPdi

PPda_parte_di, diPPda_parte_di, di

o verbverb murdermurder ( (uccidereuccidere)) subj:NP subj:NP

obj:NP obj:NP

:instr: PPcon [:instr: PPcon [WeaponWeapon] ] ((knife m., knife m., concon coltello coltello))

:means: PPper [:means: PPper [ActionAction] ] ((strangulation m., strangulation m., perper strangolamento strangolamento))

:loc: Ppploc|di [:loc: Ppploc|di [LocationLocation] ] ((Kent State murders, Kent State murders, nelnel ... ...))

:time: Ppptime|di [:time: Ppptime|di [TimeTime] ] ((1983 murders, 1983 murders, del del 19831983))

SIMPLE: SIMPLE: possible possible extensionextension

As if it were As if it were a Situationa Situation

PREDPRED: : MURDER MURDER ((uccidereuccidere))

ARG1ARG1: agent : agent [Hum/Anim?][Hum/Anim?]

ARG2ARG2: patient : patient [Hum/Anim?][Hum/Anim?]MOD1MOD1: instr : instr [Weapon][Weapon]

MOD2MOD2: means : means [Action][Action]

MOD3MOD3: ... : ... […][…]

Page 53: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 54Dottorato, Pisa, Maggio 2009

Ontologisation of SIMPLE Automatically converting and enriching a

computational lexicon into a formal Ontology

For NLP semantic tasks

Potential of ontologies in NLP as Backbone in LKBs

Pivot in multilingual architectures (e.g. KYOTO)

Reasoning capabilities

Ontologisation of SIMPLE into OWL

Conversion of the SIMPLE ontology

Bottom-up enrichment: promoting lexicon knowledge to

the ontology level

Language independent knowledge from Italian lexico-

semantic information from Antonio Toral

Page 54: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 55Dottorato, Pisa, Maggio 2009

Named Entity Repository

Automatically build LRs from existing LRs and

Web 2.0 semi-structured resources. Combine:

Authoritative lexicographic experience → precision

Collaborative “wisdom of the crowds” → recall

Case study: Multilingual NE repository from

LRs (en WN, es WN, it SIMPLE) & Wikipedia

NEs linked to three LRs and two ontologies (SUMO,

SIMPLE)

Interoperable resource: LMF compliant

Applied to cross-lingual QA (validate answers): prec.

+16,3%

from Antonio Toral

Page 55: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 56Dottorato, Pisa, Maggio 2009

Different PoS may realise an event: verbs, nouns, adjectives, prep. phrasesThe SIMPLE Lexicon helps in identifying & classifying Events (eventive nouns & adjectives) → in a 10K Words Annotation Experiment

each event is associated with an Ontological Type

the Event-Type from the SIMPLE-Ontology can be used as default value to provide event composition, and consequently to instantiate a temporal representation for each Event

improvement both in identification & classification of Events by annotators: 81.17% accuracy (vs.72.35%) and K-coefficient = 0.84 (vs. 0.7)

Morpho-SyntacticAnalysis

SIMPLE Lexicon Event Detection &Classification

Use of SIMPLE Lexicon & Ontologyfor Time and Event detection/annotation

from Tommaso Caselli

Page 56: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 57Dottorato, Pisa, Maggio 2009

Mapping SIMPLE Semantic Types to Mapping SIMPLE Semantic Types to TimeML ClassesTimeML Classes

from Tommaso Caselli

Page 57: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 58Dottorato, Pisa, Maggio 2009

GLML – Generative Lexicon Markup Language with James Pustejovsky, Olga Batiukova, Anna Rumshisky, Marc Verhagen

Annotating texts with Argument Selection, Argument Coercion, & Qualia Roles

The corpus brings reality to the model, provides statistical cues to improve language models

Lexical semantic info, like type coercion/selection, required for applications such as WSD, categorisation, IR (query reformulation, filtering…), IE (coreference resolution, relation extraction…), entailment, ..

Predicate – Argument Predicate – Argument constructionsconstructions

Predicate Sense DisambiguationPredicate Sense Disambiguation Argument selection: type Argument selection: type

selection /coercionselection /coercion Qualia role/relation selectionQualia role/relation selection

Modification constructions• Noun Sense Disambiguation • Qualia role/relation selection in

Adjectival Modification• Qualia role/relation selection in

Nominal Modification

Complex Types• Type selection in modification of Dot

Objectsfrom Valeria Quochi

Page 58: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 59Dottorato, Pisa, Maggio 2009

Using Existing Resources for Using Existing Resources for ItalianItalian

SIMPLE Lexicon&Ontology/ItalWordNetSIMPLE Lexicon&Ontology/ItalWordNet Sense DisambiguationSense Disambiguation Type selection /coercionType selection /coercion Type selection in Dot ObjectsType selection in Dot Objects

59

SIMPLE Extended Qualia StructureSIMPLE Extended Qualia StructureSelection of Qualia roles/relations., e.g.Selection of Qualia roles/relations., e.g.

Constitutive Relations

e.g Is_a_part_of , Is_a_member_of

Telic Relations

e.g. Purpose, Object_of_the_activity

Agentive Relations

e.g. Source, Result_of from Valeria Quochi

Page 59: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 60Dottorato, Pisa, Maggio 2009

Ontology & Ontology & LexiconLexicon

Today we can easily say that Today we can easily say that ontology learningontology learning, i.e. the practical , i.e. the practical feasibility of supporting knowledge acquisition in a domain, feasibility of supporting knowledge acquisition in a domain, depends on developing depends on developing automatic methods for acquiring automatic methods for acquiring conceptual representations from natural language textconceptual representations from natural language text

Semantic Web initiatives are also focussing on the building of Semantic Web initiatives are also focussing on the building of ontological representations from texts, and in this respect show a ontological representations from texts, and in this respect show a large amount of conceptual large amount of conceptual overlap with the notion of a overlap with the notion of a dynamic lexicondynamic lexicon

Based on various experiences, and as a work strategy for Based on various experiences, and as a work strategy for lexical/textual resourceslexical/textual resources

We should push towards We should push towards innovative types of lexiconsinnovative types of lexicons: a : a sort of sort of ‘example-based living lexicons’‘example-based living lexicons’ that participate that participate of properties of both lexicons and corporaof properties of both lexicons and corpora

In such a lexicon In such a lexicon redundancyredundancy is not a problem, but is not a problem, but rather a benefitrather a benefit

Lexicon & CorpusLexicon & Corpus

Page 60: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 61Dottorato, Pisa, Maggio 2009

Often a gap between advancement in LRs and Often a gap between advancement in LRs and

LTLT Either adequate LRs are missing … or there Either adequate LRs are missing … or there

are are no systems able to use “knowledge no systems able to use “knowledge intensive” LRs effectivelyintensive” LRs effectively

Shortcomings: Shortcomings: lack of usable implementations fully exploiting lack of usable implementations fully exploiting

new types of LRsnew types of LRs LR claims are not empirically evaluated LR claims are not empirically evaluated

BUT… Mismatch between LRs and LT

A A parallel evolutionparallel evolution of R&D for both LRs and LT of R&D for both LRs and LT is neededis needed

Page 61: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 62Dottorato, Pisa, Maggio 2009

Phenomena to be Phenomena to be represented/What is missing?? represented/What is missing??

from Ed Hovyfrom Ed Hovy

1. 1. Bracketing / grouping of predicationsBracketing / grouping of predications around entities around entities (basic frame structure) (basic frame structure)

2. 2. Concepts:Concepts: Choice of meaning/sense, with frames in some cases Choice of meaning/sense, with frames in some cases Definition and nature of concept repository / ontology Definition and nature of concept repository / ontology Major high-level concept groupings and classes Major high-level concept groupings and classes

3. 3. Labels on (dependency) arcsLabels on (dependency) arcs (thematic roles, types of (thematic roles, types of attributes, modifiers, etc.) attributes, modifiers, etc.)

4. 4. Coreference (explicit and indirect):Coreference (explicit and indirect): intra-sentential intra-sentential intersentential and cross-documents intersentential and cross-documents

5. 5. Information Structure and Discourse structure:Information Structure and Discourse structure: theme-rheme and topic-focus theme-rheme and topic-focus salience salience coordination coordination nonsemantic inter-clausal relations (RST’s interpersonal ones)nonsemantic inter-clausal relations (RST’s interpersonal ones) etc. etc.

dondonee

dondonee

donedone????

donedone????

Page 62: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 63Dottorato, Pisa, Maggio 2009

Phenomena to be represented/ What is Phenomena to be represented/ What is missing??missing?? Ed HovyEd Hovy 6. 6. Pragmatics:Pragmatics:

Speech Acts Speech Acts Participants and audience modeling Participants and audience modeling Modality: Modality:

Epistemic modalities Epistemic modalities Deontic modalities Deontic modalities Personal attitudes Personal attitudes

Deixis / reference to external world (or databases) Deixis / reference to external world (or databases) Social register, genre, and style Social register, genre, and style

7. 7. PolarityPolarity (including scoping) (including scoping) 8. 8. MicrotheoriesMicrotheories (many of them to be incorporated (many of them to be incorporated

elsewhere) elsewhere) Time Time (Reichenbach)(Reichenbach) Space (OWL upper ontology of space, etc.) Space (OWL upper ontology of space, etc.) Cardinality Cardinality Quantification Quantification Manner Manner Degree and comparison Degree and comparison Possession Possession Existentials Existentials Copular constructions Copular constructions Conditionals Conditionals Consequences and inference Consequences and inference Co-text and intertextuality (including formatting and Co-text and intertextuality (including formatting and

other media) other media) Meaning of prosody and other speech-related effects Meaning of prosody and other speech-related effects

donedone????

donedone????

Towards a Towards a common encoding policy???common encoding policy???

Page 63: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 64Dottorato, Pisa, Maggio 2009

Lexicon and Corpus:Lexicon and Corpus:a multi-faceted interactiona multi-faceted interaction

Lexicon and Corpus:Lexicon and Corpus:a multi-faceted interactiona multi-faceted interaction

L L C C taggingtagging C C L L frequencies (of different linguistic “objects”)frequencies (of different linguistic “objects”) C C L L proper nouns, acronyms, …proper nouns, acronyms, … L L C C parsing, chunking, …parsing, chunking, … C C L L training of parserstraining of parsers C C L L lexicon updatinglexicon updating C C L L “collocational” data (MWE“collocational” data (MWE, idioms, gram. patterns ...), idioms, gram. patterns ...) C C L L “nuances” of meanings & semantic clustering“nuances” of meanings & semantic clustering C C L L acquisition of lexical (syntactic/semantic) knowledgeacquisition of lexical (syntactic/semantic) knowledge L L C C semantic tagging/word-sense disambiguation semantic tagging/word-sense disambiguation

(e.g. in Senseval)(e.g. in Senseval) C C L L more semantic information on LEmore semantic information on LE C C L L corpus based computational lexicographycorpus based computational lexicography C C L L validation of lexical modelsvalidation of lexical models C C L L …… L L C C ......

Page 64: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 65Dottorato, Pisa, Maggio 2009

… … Dynamic lexiconsDynamic lexicons Current computational lexicons (even WordNets) are Current computational lexicons (even WordNets) are

static objectsstatic objects, still shaped on traditional dictionaries , still shaped on traditional dictionaries

Towards a Towards a flexible model of dynamic lexiconflexible model of dynamic lexicon extending the expressiveness of a core static lexicon extending the expressiveness of a core static lexicon adapting to the requirements of language in use as attested adapting to the requirements of language in use as attested

in corporain corpora with semantic clustering techniques, etc.with semantic clustering techniques, etc.

Convert the extreme flexibility & multidimensionality Convert the extreme flexibility & multidimensionality of meaning into of meaning into

large-scale and exploitable (VIRTUAL?) resourceslarge-scale and exploitable (VIRTUAL?) resources

a “Lexicon & Corpus” togethera “Lexicon & Corpus” togetherSort ofSort of Example-based LexiconExample-based Lexicon

BUTBUT

Page 65: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 66Dottorato, Pisa, Maggio 2009

Verb/Arguments InteractionVerb/Arguments Interaction at the Lexical-Semantic Levelat the Lexical-Semantic Level

Verb meaning Verb meaning determines/selects the determines/selects the ‘sense’ of its subject and/or direct object‘sense’ of its subject and/or direct object

e.g. e.g. arrestarearrestare, both , both ‘to arrest’‘to arrest’ & & ‘to stop’‘to stop’, selects direct , selects direct objects which have themselves, or receive from the verb, a objects which have themselves, or receive from the verb, a negative connotationnegative connotation

DobjDobj Sem.type Sem.type Conn.Feat.Conn.Feat.

o ladro1ladro1 agent_temp_actagent_temp_act negnego spacciatore1spacciatore1 agent_temp_actagent_temp_act negnego trafficante1trafficante1 agent_temp_actagent_temp_act negnego traffico 2traffico 2 actact negnego invasione1invasione1 cause_actcause_act negnego massacro1massacro1 cause_nat_transcause_nat_trans negnego inflazione1inflazione1 eventevent negnego pregiudicato1pregiudicato1 humanhuman negnego balordo1balordo1 humanhuman neg nego maniaco1maniaco1 humanhuman neg nego strozzino 1strozzino 1 agent_temp_actagent_temp_act negneg

Page 66: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 67Dottorato, Pisa, Maggio 2009

Complexity of Word Sense in Complexity of Word Sense in context: context:

many potential cluesmany potential cluesA particular meaningA particular meaning (of a verb) may be (of a verb) may be selected selected by:by:

A A specific syntactic patternspecific syntactic pattern comprenderecomprendere + + that-clausethat-clause = ‘to understand’ [not = ‘to include’] = ‘to understand’ [not = ‘to include’] aprireaprire + + PP introduced by PP introduced by aa (preferably with “human” head) = ‘to be ready, (preferably with “human” head) = ‘to be ready,

open, well disposed towards someone’ (e.g. open, well disposed towards someone’ (e.g. Cossiga apre a La MalfaCossiga apre a La Malfa)) The The semantic type of subjects, dir objects, ind. objectssemantic type of subjects, dir objects, ind. objects

human human subject (if not collective type) always selects the meaning ‘to subject (if not collective type) always selects the meaning ‘to understand’ of the verb understand’ of the verb comprenderecomprendere

The The domain of usedomain of use perseguire un reatoperseguire un reato ‘to prosecute a crime’ ( ‘to prosecute a crime’ (domaindomain=law=law))

A A specific modifierspecific modifier perseguire penalmente ‘perseguire penalmente ‘to prosecute at the penal level’,to prosecute at the penal level’, not ‘to pursue (a not ‘to pursue (a

goal)’goal)’ comprendere benissimocomprendere benissimo ‘ ‘to understand very well’, not ‘to include’to understand very well’, not ‘to include’

Two Two different senses of a lemma cannot be selected simultaneouslydifferent senses of a lemma cannot be selected simultaneously in the same context in the same context

BUT…BUT…

Page 67: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 68Dottorato, Pisa, Maggio 2009

Complexity of Word Sense Complexity of Word Sense identificationidentification

The problem: The problem: not sure testsnot sure tests only partial validity & not completely discriminatingonly partial validity & not completely discriminating

Moreover, it’s Moreover, it’s not easy to predict when to apply which testnot easy to predict when to apply which test

Word Sense Disambiguation (WSD)Word Sense Disambiguation (WSD) in different contexts is better achieved in different contexts is better achieved using info using info

types at different levels of linguistic descriptiontypes at different levels of linguistic description::

morphosyntactic/syntactic/semantic/pragmatic…, even morphosyntactic/syntactic/semantic/pragmatic…, even multilingualmultilingual

BUT BUT a-priori unpredictable where is the a-priori unpredictable where is the “clue”“clue”

Page 68: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 69Dottorato, Pisa, Maggio 2009

Complexity of Word Sense & use of Complexity of Word Sense & use of CorporaCorpora

The availability of large quantities of The availability of large quantities of semantically tagged corporasemantically tagged corpora helps to helps to analyse the analyse the impact of different “clues”impact of different “clues” to to

perform WSDperform WSD in different contexts in different contexts study the study the interaction of cluesinteraction of clues belonging to belonging to

different levels of linguistic descriptiondifferent levels of linguistic description , to , to improve WSD strategies improve WSD strategies

not just statistics!!not just statistics!!

Automatically acquire Automatically acquire syntactic, semantic, syntactic, semantic, collocational (lexical) ‘indicators’collocational (lexical) ‘indicators’ which can help in the identification of a word-sensewhich can help in the identification of a word-sense

‘‘List’ them in the lexicon??List’ them in the lexicon??

Page 69: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 70Dottorato, Pisa, Maggio 2009

Problem of regular polysemyProblem of regular polysemy … and more … and more

BUT…BUT… actual occurrence of actual occurrence of “two senses”“two senses” in the in the samesame contextcontext……

e.g. e.g. bothboth act & result (for deverbal nouns, etc.) act & result (for deverbal nouns, etc.)

In una In una comunicazione comunicazione al Parlamento la Commissione ha al Parlamento la Commissione ha illustrato le sue riflessioni su …illustrato le sue riflessioni su …

Berlusconi dovrà scegliere se fare l’uomo di governo o Berlusconi dovrà scegliere se fare l’uomo di governo o mantenere il mantenere il controllocontrollo delle delle sue tvsue tv

Underspecified meaningsUnderspecified meanings?? maybe subsuming more granular distictions, maybe subsuming more granular distictions, to be used to be used

only when disambiguation is feasible/only when disambiguation is feasible/useful useful in a contextin a context

Theoretical languageTheoretical language, , “invented”“invented” by by lexicographers/linguists who have/want to classify in lexicographers/linguists who have/want to classify in disjoint classes, disjoint classes, vsvs..

actual usageactual usage a a “continuum”“continuum” resistant to clear-cut disjunctions resistant to clear-cut disjunctions

by necessity ambiguous wrt imposed classificationsby necessity ambiguous wrt imposed classifications

Page 70: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 71Dottorato, Pisa, Maggio 2009

… … what cannot be easily encodedwhat cannot be easily encoded at the Lexical-Semantic Levelat the Lexical-Semantic Level

In a “Senseval” framework …In a “Senseval” framework …

When sense interpretation requires appeal to When sense interpretation requires appeal to extra-linguistic knowledgeextra-linguistic knowledge ( (not to be not to be captured at the lexical-semantic level of description)captured at the lexical-semantic level of description)

When When corpus annotationcorpus annotation either either diverges from the lexical resource or further diverges from the lexical resource or further specifies itspecifies it

words acquiring a words acquiring a specific sense, strictly dependent on the contextspecific sense, strictly dependent on the contextla donna Pauline Collins, che ha già visto arrestare il marito dai la donna Pauline Collins, che ha già visto arrestare il marito dai tedeschitedeschi,…,…

variety of nuancesvariety of nuances of a verb, e.g. according to co-occurring dir.obj. sem-type of a verb, e.g. according to co-occurring dir.obj. sem-type

metaphors extended to an entire sentencemetaphors extended to an entire sentence

l’auto l’auto verdeverde arriva sularriva sul tavolo del governotavolo del governo

(lit. the (lit. the greengreen car car arrives onarrives on the the table of the governmenttable of the government))

......

Not all these “shifts of meanings” can/must be captured Not all these “shifts of meanings” can/must be captured through lexical-semantic annotationthrough lexical-semantic annotation

e.g.e.g.

Page 71: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 72Dottorato, Pisa, Maggio 2009

Wrt Senseval Wrt Senseval jargon, neologisms, evaluative jargon, neologisms, evaluative

suffixation, ‘titles’, …suffixation, ‘titles’, …

vetturettavetturetta minitaximinitaxi fumantino (agg. una fumantino (agg. una

persona fumantina)persona fumantina) komeinistakomeinista ……

Primula rossa (= boss Primula rossa (= boss mafioso)mafioso)

Scarpa d'oro (= un bravo Scarpa d'oro (= un bravo giocatore)giocatore)

……

Not in any lexiconNot in any lexicon……

a a semantic type easier to assignsemantic type easier to assign than a than a word-sense in a lexiconword-sense in a lexicon

Page 72: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 73Dottorato, Pisa, Maggio 2009

Compounds and idiomsCompounds and idioms

uscire di scenauscire di scena farla francafarla franca fare fuocofare fuoco andare in ondaandare in onda …… fare [in tempo]fare [in tempo] andare [a piedi]andare [a piedi] essere [in testa]essere [in testa]

(= essere il primo)(= essere il primo) vincere [per un vincere [per un

soffio]soffio] partire [a razzo]partire [a razzo]

Croce RossaCroce Rossa Caschi BluCaschi Blu conflitto a fuococonflitto a fuoco atletica leggeraatletica leggera famiglia benefamiglia bene un bagno di follaun bagno di folla ……

Where is the boundary of the MWE? Where is the boundary of the MWE? ""andare_a_piediandare_a_piedi" vs. " vs. andareandare (Pos V) (Pos V) a_piedia_piedi (Pos Adv.loc).? (Pos Adv.loc).?

Page 73: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 74Dottorato, Pisa, Maggio 2009

Locutions Locutions andand Figurative usages Figurative usages

per caritàper carità in questionein questione per caso per caso in lizzain lizza a volontà a volontà a buon mercatoa buon mercato …… ci mancherebbe!ci mancherebbe! c'è mancato pococ'è mancato poco ……

due lavoratori su tre due lavoratori su tre sono a sono a casacasa (= essere disoccupato) (= essere disoccupato) [the collocation with ‘[the collocation with ‘lavoratorilavoratori’ ’ disambiguates the expression]disambiguates the expression]

uomo [di polso]uomo [di polso] zona medaglia d'orozona medaglia d'oro (= tra i (= tra i

primi)primi) a cielo apertoa cielo aperto (discarica a ..) (discarica a ..) la bella vitala bella vita (fare …) (fare …) ……

If annotation of individual components, loss of the semantic If annotation of individual components, loss of the semantic contribution of the MWEcontribution of the MWE acquistare un oggetto acquistare un oggetto a buona buon (Pos A) (Pos A) mercatomercato (Pos S) !!(Pos S) !!

Page 74: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 75Dottorato, Pisa, Maggio 2009

Usual issues: Usual issues: “Is there a fixed set of senses?”“Is there a fixed set of senses?”

or “Do senses exist as separate objects?”or “Do senses exist as separate objects?”

Criteria for sense distinctionCriteria for sense distinction very very application-dependentapplication-dependent greater vs. lesser granularitygreater vs. lesser granularity depend on the depend on the task/ domain/situationtask/ domain/situation/etc./etc. i.e. the communication purposei.e. the communication purpose

& there is & there is no inherently “true”no inherently “true” (upper or lower) limit to the granularity ... (upper or lower) limit to the granularity ...

Impossible a Impossible a “checklist theory of meaning”:“checklist theory of meaning”: meaning as a “piece meaning as a “piece

of information” with an autonomous status independent of its useof information” with an autonomous status independent of its use

Computational resources should provide Computational resources should provide multi-dimensional informationmulti-dimensional information the highest expressiveness in terms of sense-discriminating powerthe highest expressiveness in terms of sense-discriminating power contextualcontextual information information

Are we dealing with semantic annotation in the Are we dealing with semantic annotation in the right way??right way??

Page 75: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 76Dottorato, Pisa, Maggio 2009

Divergences betw. Divergences betw.

Lexicon encoding & Corpus annotationLexicon encoding & Corpus annotation

In the In the lexicon lexicon senses are senses are “de-contextualized”“de-contextualized” (a necessity to capture (a necessity to capture generalizations) generalizations) sense discriminationsense discrimination must be kept must be kept “under control”“under control” clustering clustering (manually or automatically) (manually or automatically)

In the In the corpus sense annotationcorpus sense annotation task taskcontextualizationcontextualization plays a predominant role plays a predominant rolecalls for a range of calls for a range of pragmatic issuespragmatic issues corpus analysis per se would lead to excessivecorpus analysis per se would lead to excessive granularity of sense granularity of sense distinctionsdistinctions

Capture just the core basic distinctions in a core lexicon & Capture just the core basic distinctions in a core lexicon & Acquire Acquire additional, additional, more granularmore granular info (usu. of info (usu. of collocationalcollocational nature) nature) from corporafrom corporato be encoded within the broader senses, e.g. to help translationto be encoded within the broader senses, e.g. to help translation

not yet not yet solvedsolved

not yet not yet solvedsolved

Page 76: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 77Dottorato, Pisa, Maggio 2009

BetweenBetween LRs LRs and and Linguistics:Linguistics:

A consequence of the A consequence of the corpus-based approachcorpus-based approach is is

Compels toCompels to break hypothesesbreak hypotheses too easily taken for granted too easily taken for granted

in mainstream linguisticsin mainstream linguistics

In actual usage a characteristics of language is to displayIn actual usage a characteristics of language is to display many many

propertiesproperties which behave which behave as a continuumas a continuum, not as “yes/no” , not as “yes/no”

propertiesproperties

The same holds true for so-called “rules”: we find more The same holds true for so-called “rules”: we find more

frequentlyfrequently “tendencies” towards a rule“tendencies” towards a rule than precise rules than precise rules

Many of the theoretical rules appear to beMany of the theoretical rules appear to be simplifications or simplifications or

idealisations idealisations in factin fact dispelled by real usage dispelled by real usage

A number ofA number of dichotomiesdichotomies must then be must then be reconciledreconciledLesson learnedLesson learned:: [IN-][IN-]Adequacy ofAdequacy of Lexical resourcesLexical resources

A long way to be able to recognise & integrate the many dimensions relevant to content interpretation

Page 77: N. Calzolari1Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa glottolo@ilc.cnr.it Risorse Linguistiche

N. Calzolari 78Dottorato, Pisa, Maggio 2009

A number ofA number of “dichotomies” “dichotomies” not as opposite not as opposite

views, views,

but asbut as complementary perspectivescomplementary perspectives

A number ofA number of “dichotomies” “dichotomies” not as opposite not as opposite

views, views,

but asbut as complementary perspectivescomplementary perspectives

Language as a Language as a continuumcontinuum::

rules vs. tendenciesrules vs. tendencies absolute constraints vs. preferencesabsolute constraints vs. preferences discreteness vs. continuum/gradednessdiscreteness vs. continuum/gradedness theoretical/potential vs. actualtheoretical/potential vs. actual intuition/introspection vs. empirical evidenceintuition/introspection vs. empirical evidence theory-driven vs. data-driventheory-driven vs. data-driven symbolic vs. statisticalsymbolic vs. statistical

the right part must be highlighted,the right part must be highlighted,

then to then to combine combine the twothe two

Choices on the Choices on the syntagmatic axissyntagmatic axis are pervasive are pervasive

Lexicon & Corpus must converge