View
215
Download
3
Category
Tags:
Preview:
Citation preview
N. Calzolari 1Dottorato, Pisa, Maggio 2009
Nicoletta Calzolari Nicoletta Calzolari
Istituto di Linguistica Computazionale - CNR - Pisa
glottolo@ilc.cnr.it
Risorse Linguistiche Risorse Linguistiche
(lessici, corpora, ontologie, …) (lessici, corpora, ontologie, …)
Standard e tecnologie Standard e tecnologie
linguistiche linguistiche
With many others at ILC
N. Calzolari 2Dottorato, Pisa, Maggio 2009
1) 1) Because the main trend until Because the main trend until mid-’80smid-’80s was to privilege was to privilege the processing of the processing of “critical” phenomena“critical” phenomena,, studied by the studied by the dominating linguistic theories, rather than focusing on the dominating linguistic theories, rather than focusing on the deep analysis of the real uses of a languagedeep analysis of the real uses of a language As a result CL was focusing on: As a result CL was focusing on:
few examples - often artificially built lexicons made of few entries (toy lexicons) grammars with poor coverage
2)2) Because large-scale LRs are Because large-scale LRs are costlycostly & their production & their production requires a big organizing effortrequires a big organizing effort
WhyWhy such needed LRs, such needed LRs, were were lackinglacking
after 30 years of R&D in the field?after 30 years of R&D in the field?
Old slide with Antonio Zampolli (’80s/early ‘90s)
WhyWhy we we stillstill lack them?? lack them??
N. Calzolari 3Dottorato, Pisa, Maggio 2009
Early interest:Early interest: To become machine-tractable To extract info from them – with much less powerful tools than now Precursor of the trend of automatic acquisition from corpora
Acquilex (Pisa et al.)Work on/with Longman dictionary (Las Cruces)
NSF & EC International Cooperation grant, NSF & EC International Cooperation grant, promoted by Wilks, Zampolli, Calzolari (Las Cruces & Pisa)
Don Walker Don Walker & &
Antonio ZampolliAntonio Zampolli
Work on Machine Readable Dictionaries:Work on Machine Readable Dictionaries:
The The beginnings…beginnings… After many years of complete disregard – or even disdain
and contempt – for LRs, due mainly to the prevalence and influence of the generativist school
Pioneering Pioneering ResearchResearch
Historical notes
N. Calzolari 4Dottorato, Pisa, Maggio 2009
… … back from the ’70s/‘80sback from the ’70s/‘80s
It became evident that:It became evident that:
Part of the results of meaning extractionPart of the results of meaning extraction, e.g. many meaning , e.g. many meaning distinctions, which could be generalised over lexicographic definitions distinctions, which could be generalised over lexicographic definitions and automatically captured, and automatically captured,
were were unmanageable at the formal representation levelunmanageable at the formal representation level, and had , and had to be blurred into unique features and values. to be blurred into unique features and values.
Unfortunately, it is Unfortunately, it is still todaystill today difficult to constrain word-meanings difficult to constrain word-meanings within a rigorously defined organizationwithin a rigorously defined organization: by their very nature they : by their very nature they tend to evade any strict boundariestend to evade any strict boundaries
Automatic acquisition of lexical information Automatic acquisition of lexical information from MRDsfrom MRDs
Was at the centre of activities in Was at the centre of activities in Pisa Pisa groupgroup, Amsler, Briscoe, , Amsler, Briscoe, Boguraev, WilksBoguraev, Wilks’ group, ’ group, IBMIBM, then , then JapaneseJapanese groups, … groups, …The trend was: “The trend was: “large-scale computational methods for the large-scale computational methods for the transformation of machine readable dictionaries (MRDs) into transformation of machine readable dictionaries (MRDs) into machine tractable dictionariesmachine tractable dictionaries””
N. Calzolari 5Dottorato, Pisa, Maggio 2009
The lexicon has become ever more relevant
Both international and national authorities started investing in the field as never before, interested in technologies & systems which are really working and are economically interesting
The need of empirical methods, based on the analysis of large amount of data, has been recognized
LRs must be robust enough for analysing the concrete uses of a language, either theoretically “interesting” or not
After that pioneering era, production & use of adequate LRs strongly increased
Data-Data-driven driven
approacheapproachess
N. Calzolari 6Dottorato, Pisa, Maggio 2009
LRs have acquired larger resonance in the last 2 decades, when many activities, in Europe and world-wide, have contributed to substantial advances in knowledge and capability of how to represent, create,
acquire, access, exploit, harmonise, tune, maintain, distribute, etc. large lexical and textual repositories
In Europe an essential role was played by the EC, through initiatives NERCNERCPAROLEPAROLESIMPLESIMPLEEuroWordNetEuroWordNetEAGLESEAGLESISLEISLEELSNETELSNETRELATORRELATOR……
that saw the participation of many EU groups, linked over the years by sharing common approaches and visions
Since then …Since then …
N. Calzolari 7Dottorato, Pisa, Maggio 2009
Automatic acquisition of infoAutomatic acquisition of info
from texts: from texts:
This trendThis trend has become has become today a consolidated today a consolidated
factfact, and we have moved , and we have moved
from focusing on acquisition of from focusing on acquisition of “linguistic “linguistic
information”information” (as at the beginning) (as at the beginning)
to broader acquisition of to broader acquisition of “general knowledge”,“general knowledge”,
with more data intensive, robust, reliable with more data intensive, robust, reliable
methodsmethods
… … back from the late ‘80sback from the late ‘80s
After acquisition from MRDs,After acquisition from MRDs,
N. Calzolari 8Dottorato, Pisa, Maggio 2009
LRs give to NLP systems the knowledge needed for the various linguistic LRs give to NLP systems the knowledge needed for the various linguistic
processingprocessing
Realising that most of the needed information Realising that most of the needed information escapesescapes individual “ individual “introspectionintrospection”” can only be can only be acquiredacquired analysing large textual analysing large textual corpora corpora attesting language use attesting language use
in different fields/communicative contextsin different fields/communicative contexts
BUT need of adequate modelsneed of adequate models to handle actual usage of language
LRs as necessary infrastructure (Lexicons/Corpora)
both for research & applications:
Sub-product?:Sub-product?: Importance of Importance of statisticalstatistical methods methods
Lesson: Going from core sets to large coverageto large coverage has implications
not just in quantitative terms, but more interestingly in terms of changes to the models changes to the models and the strategies of processes
We started We started building:building:
N. Calzolari 9Dottorato, Pisa, Maggio 2009
What are we (LT& LR) What are we (LT& LR) assembling, …. since many assembling, …. since many years?years? Lexicons & their OntologiesLexicons & their Ontologies
Written, Spoken, ItalWordNets, PAROLE/SIMPLE, Written, Spoken, ItalWordNets, PAROLE/SIMPLE, FrameNets, …FrameNets, …
Annotated corpora/TreebanksAnnotated corpora/Treebanks Basic ToolsBasic Tools
Integrated Architecture for Integrated Architecture for Annotation at various levels (from morph. to Annotation at various levels (from morph. to
conceptual)conceptual) Acquisition/learningAcquisition/learning Classification Classification Ontology creationOntology creation ……
MethodologiesMethodologies Know-how Know-how & expertise& expertise Infrastructural bodies Infrastructural bodies (on which to build)
Standards
… … components components of a very of a very large large infrastructurinfrastructure of LRs & LTe of LRs & LT
N. Calzolari 10Dottorato, Pisa, Maggio 2009
History:History: Some international LRs Some international LRs initiativesinitiatives
ACQUILEX ACQUILEX [[since since ’88’88]]
MULTILEXMULTILEX ET-7ET-7 ET-10ET-10 TEITEI NERCNERC RELATORRELATOR ONOMASTICAONOMASTICA MULTEXTMULTEXT COLSITCOLSIT LSGRAMLSGRAM DELISDELIS EAGLESEAGLES PAROLEPAROLE SIMPLESIMPLE SPARKLESPARKLE ELSNETELSNET EuroWordNetEuroWordNet
MATEMATE NITENITE Cluster 488 Cluster 488
(Italian)(Italian) TAL TAL (Italian)(Italian) ISLEISLE ENABLERENABLER INTERAINTERA LIRICSLIRICS …… Senseval/Senseval/
SemevalSemeval WRITEWRITE Forum TAL Forum TAL
(Italian)(Italian) …… ISOISO ELRAELRA LRECLREC LRE JournalLRE Journal NEDONEDO Language GridLanguage Grid BootStrepBootStrep KYOTOKYOTO ……
Essential role of ECEssential role of ECto start a basic to start a basic InfrastructureInfrastructure
EU at the EU at the forefront in the forefront in the
areas of LRs areas of LRs and standards and standards
in the ’90sin the ’90s
Established a modelEstablished a model
N. Calzolari 11Dottorato, Pisa, Maggio 2009
Today: a broad “potential” Today: a broad “potential” InfrastructureInfrastructure
RELATORRELATOREAGLES/ISLEEAGLES/ISLEENABLERENABLER ELSNETELSNETTELRITELRIINTERAINTERALIRICSLIRICS……ELRAELRA
BLARKBLARKUnified Lexicon (W/S)Unified Lexicon (W/S)
LRECLRECLRE journalLRE journal……ERANET-LangNetERANET-LangNet……
LDC LDC & others& othersISO ISO COCOSDA/WRITECOCOSDA/WRITE US US
CyberinfrastructurCyberinfrastructuree
Japan COE21Japan COE21NEDONEDOLanguage Grid Language Grid ……
EUEU InternatInternat
National National
………………
Cooperative
Cooperative
initiatives –
initiatives –
Links to…Links to…
FLaReNet FLaReNet (ICT)(ICT)CLARIN CLARIN (ESFRI)(ESFRI)
Vitality &Vitality & Success signs… for LRsSuccess signs… for LRs
N. Calzolari 12Dottorato, Pisa, Maggio 2009
{{Casa,abitazione,dimoraCasa,abitazione,dimora}}
HyperonymHyperonym:: {edificio,..}
Hyponym:Hyponym:{villetta }{catapecchia, bicocca, .. }{cottage}{bungalow }
Role_location: {stare, abitare, ...}
Role_target_direction: {rincasare}
Role_patient: {affitto, locazione}
Mero_part: {vestibolo}
{stanza}Holo_part: {casale} {frazione} {caseggiato}
{{home,domicile,..}}{{house}}
TOP ConceptsTOP Concepts: Object,Artifact,BuildingObject,Artifact,Building
WordNetsWordNetsSynsets linked by semantic relationsSynsets linked by semantic relations
N. Calzolari 13Dottorato, Pisa, Maggio 2009
ItalWordNet ItalWordNet Semantic NetworkSemantic Network
[Italian module of EuroWordNetEuroWordNet]
~ 55.00055.000 lemmas organized in synonym groupssynonym groups (synsetssynsets), structured in
hierarchieshierarchies & linked by ~ 130.000130.000 semantic relations
~ ~ 55.000 hyperonymy/hyponymy relations~ 16.000 relations among different POS (role, cause, derivation, etc..)~ 2.000 part-whole relations~ 1.500 antonymy relations, …etc.
Synsets linked to the InterLingual Index linked to the InterLingual Index (ILI=Princeton WordNet),
Through the ILIILI link to all the European European WordNetsWordNets (de-facto standard) & to the common Top OntologyTop Ontology
• Usable in IR, CLIR, IE, QA, ...
Possibility of plug-in withplug-in with domain terminological lexiconsdomain terminological lexicons
(legal, maritime, … linguistic… linguistic)
N. Calzolari 14Dottorato, Pisa, Maggio 2009
skinhairbody-covering
Top
1stOrderEntity 2ndOrderEntity
SituationType SituationComponent
Living
Location ExperiencePhysicalStatic DynamicNaturalCovering Part Group
Composition OriginFunction Form
Etc….Etc.
bodypartcellmuscleorgan
Object
Human
Mental
Directiondistancespatial propertyspatial relationcoursepath
change of positiondividelocomotionmotion
feeldesiredisturbanceemotionfeelinghumorpleasance
churchcompanyinstituteorganizationpartyunion
humanadultadult femaleadult malechildnativeoffspring
ItalWordNet: ItalWordNet: Clusters of “Base Concepts” Clusters of “Base Concepts”
classified according to Ontology Top Conceptsclassified according to Ontology Top Concepts= words= words
= features= featuresLexicon or ontology
???
N. Calzolari 15Dottorato, Pisa, Maggio 2009
1stOrderEntity1stOrderEntity
OriginOriginNaturalNatural
LivingLivingPlantPlantHumanHumanCreatureCreatureAnimalAnimal
ArtifactArtifactFormForm
SubstanceSubstance SolidSolidLiquidLiquidGasGas
Object1Object1CompositionComposition
PartPartGroupGroup
FunctionFunctionVehicleVehicleRepresentationRepresentation
MoneyRepresentationMoneyRepresentationLanguageRepresentationLanguageRepresentationImageRepresentationImageRepresentation
SoftwareSoftwarePlacePlaceOccupationOccupationInstrumentInstrumentGarmentGarmentFurnitureFurnitureCoveringCoveringContainerContainerComestibleComestibleBuildingBuilding
2ndOrderEntity2ndOrderEntity
SituationTypeSituationTypeDynamicDynamic
BoundedEventBoundedEvent
UnboundedEventUnboundedEventStaticStatic
PropertyPropertyRelationRelation
SituationComponentSituationComponentCauseCause
AgentiveAgentivePhenomenalPhenomenalStimulatingStimulating
CommunicationCommunicationConditionConditionExistenceExistenceExperienceExperienceLocationLocationMannerMannerMentalMentalModalModalPhysicalPhysicalPossessionPossessionPurposePurposeQuantityQuantitySocialSocialTimeTimeUsageUsage
3rdOrderEntity3rdOrderEntity
EWNEWNTop-Top-OntologyOntology
ItalWordItalWordNetNet
N. Calzolari 17Dottorato, Pisa, Maggio 2009
hond
dog
cane
perro
dog Italian WN
TOP ONTOLOGY
Spanish WN
Dutch WN
English WN
ANIMAL
ILI
LIVING
HUMAN
French WN German
WN
Estonian WN
Czech WN
EuroWordNet EuroWordNet Multilingual Data StructureMultilingual Data Structure
EnglishEnglishEnglishEnglish
……
……
N. Calzolari 18Dottorato, Pisa, Maggio 2009
Terminological Wordnets: Terminological Wordnets:
e.g. e.g. JurJur--WordNetWordNet
JurJur-WordNet-WordNet EExtension for the xtension for the juridical domainjuridical domain
of ItalWordNet of ItalWordNet (With ITTIG-CNR - Istituto di Teoria e Tecniche dell’Informazione Giuridica)(With ITTIG-CNR - Istituto di Teoria e Tecniche dell’Informazione Giuridica)
Knowledge base for multilingual access to sources of legal Knowledge base for multilingual access to sources of legal informationinformation
Source of metadata for semantic markup oflegal textsSource of metadata for semantic markup oflegal texts
To be used, together with the generic ItalWordNet, in To be used, together with the generic ItalWordNet, in applications of Information Extraction, Question Answering, applications of Information Extraction, Question Answering, Automatic Tagging, Knowledge Sharing, Norm Comparison, Automatic Tagging, Knowledge Sharing, Norm Comparison, etc.etc.
N. Calzolari 19Dottorato, Pisa, Maggio 2009
Terminological Lexicon of Terminological Lexicon of NavigationNavigation
NoloNolo
Synset Synset 1.614 1.614Lemmas Lemmas
2.1162.116Senses Senses 2.232 2.232Nouns Nouns 1.621 1.621Verbs Verbs 205 205Adjectives Adjectives 35 35Proper Nouns Proper Nouns
236236
N. Calzolari 20Dottorato, Pisa, Maggio 2009
SIMPLE Lexicon & OntologySIMPLE Lexicon & Ontology Multidimensional Type HierarchyMultidimensional Type Hierarchy
Shared by Shared by 1212 European languagesEuropean languages Theoretical background: Theoretical background: Generative LexiconGenerative Lexicon
(Pustejovsky)(Pustejovsky)
157 language independent SIMPLE semantic 157 language independent SIMPLE semantic types:types: Based on Based on hierarchical & non-hierarch. conceptual relationshierarchical & non-hierarch. conceptual relations
Difference of internal complexity:Difference of internal complexity:
Simple types Simple types (one-dimensional) characterised in terms of (one-dimensional) characterised in terms of hyperonymic relationshyperonymic relations
Unified typesUnified types ( (multi-dimensionalmulti-dimensional) only definable through the ) only definable through the combination of:combination of:
the relation to their supertype +the relation to their supertype + the reference to the reference to orthogonal dimensions of meanings orthogonal dimensions of meanings
(through the Qualia(through the Qualia Structure) Structure)
http://www.ilc.cnr.it/clips/CLIPS_ENGLISH.htm
N. Calzolari 21Dottorato, Pisa, Maggio 2009
PAROLE- SIMPLE-CLIPS Lexicon: PAROLE- SIMPLE-CLIPS Lexicon: …harmonised model for 12 European …harmonised model for 12 European
languageslanguages
N. Calzolari 22Dottorato, Pisa, Maggio 2009
SemUSemU Predicate, arguments, Predicate, arguments, Selection restrictionsSelection restrictions
Pred. LayerPred. Layer
QualiaQualia DerivationDerivation PolysemyPolysemy Event TypeEvent Type
InstantiationInstantiation
…
Italian lexiconItalian lexicon
Type Type OntologyOntology
150 types150 types
TemplateTemplate Catalan lexiconCatalan lexicon
Danish lexiconDanish lexicon
Greek lexiconGreek lexicon
Overall Overall OrganizationOrganization
......
N. Calzolari 23Dottorato, Pisa, Maggio 2009
Model Architecture Model Architecture The first three levels : Information contentThe first three levels : Information content
Phonological Unit
Phonological Unit
stress positionvowel opennesscons. prononciation
PoS (& PoS subcategory) inflectional paradigm Morphological
UnitMorphological
Unit
position list position restr.
position list position restr. a. head properties
b. subcat. frame
Corresp. PhnU-MrphU
Corresp. MrphU-SynU
Syntactic Unit
Syntactic Unit
Synt. Struct
Synt. Struct 2
Frameset1
a. head properties b. subcat. frame
syntacticargument
syntacticbehaviour
N. Calzolari 24Dottorato, Pisa, Maggio 2009
Semantic Unit
arguments:
sem. role; sem. restr.
lexical predicate
Semantic properties
Ontological type
Domain
Event Type
Extended Qualia Structure
Synonymy
Regular Polysemy alt.
Derivation
Predicative Representation
Link to syntactic unit
F
E
A
T
U
R
E
S
R
E
L
A
T
I
O
N
S
A
M
O
N
G
S
E
M
U
S
The semantic level:The semantic level: Information types Information types
N. Calzolari 25Dottorato, Pisa, Maggio 2009
Aumento Aumento (Increase):(Increase):
• Semantic type: Cause_change_of_value
• Gloss: accrescimento in dimensione o quantità
• Agentivecause: yes
L’aumento dei prezzi di un venti%L’aumento dei prezzi di un venti%
• Supertype: Cause_relational_change
• Eventype: transition• Domain: general, economics
• aumento Isa cambiamento
• aumento resulting_state maggiore
• Direction: up
• Morphological derivation: Eventverb aumentare
• Semantic predicate: PRED_aumentare; 3 arguments
• Type of link: event nominalization
• Arguments description: range, semantic role & selectional restriction:
Arg0
Protoagent
Human / Institution
Arg1
ProtoPatient
Entity
Arg2
Quantifier
Amount
SEMANTIC ENTRY CONTENTSEMANTIC ENTRY CONTENTSEMANTIC ENTRY CONTENTSEMANTIC ENTRY CONTENT
ONTOLOGICAL INFO.ONTOLOGICAL INFO.
EXTENDED QUALIA INFO.EXTENDED QUALIA INFO.
PREDICATIVE REPRESENTATIONPREDICATIVE REPRESENTATION
N. Calzolari 26Dottorato, Pisa, Maggio 2009
Semantic entry
ontological type
event type
domain information
qualia features
Extended Qualia Structure
regular polysemy
predicative representation
semantic type: Instrumentunification_path: [Concrete_entity | ArtifactAgentive | Telic]
semantic type: Instrumentunification_path: [Concrete_entity | ArtifactAgentive | Telic]
eventype: =====eventype: =====
cleaning, gardening, cosmeticscleaning, gardening, cosmetics
==========
USem3527vaporizzatore isa Usem3479apparecchioUSem3527vaporizzatore has_as_part Usem61633pulsanteUSem3527vaporizzatore created_by UsemD387fabbricareUSem3527vaporizzatore used_for UsemD66019nebulizzare
USem3527vaporizzatore isa Usem3479apparecchioUSem3527vaporizzatore has_as_part Usem61633pulsanteUSem3527vaporizzatore created_by UsemD387fabbricareUSem3527vaporizzatore used_for UsemD66019nebulizzare
regular polysemy: =====regular polysemy: =====
USem3527vaporizzatore
free definitionapparecchio usato per vaporizzare
apparecchio usato per vaporizzare
exampleun vaporizzatore per piante
un vaporizzatore per piante
semantic relations
USem3527vaporizzatore synonymy USem72288nebulizzatoreUSem3527vaporizzatore instrumentverb Usem5239vaporizzare
USem3527vaporizzatore synonymy USem72288nebulizzatoreUSem3527vaporizzatore instrumentverb Usem5239vaporizzare
semantic predicate: PRED_vaporizzare-1type of link: instrument nominalization
arguments description: • range • semantic role • select. restrictions
semantic predicate: PRED_vaporizzare-1type of link: instrument nominalization
arguments description: • range • semantic role • select. restrictions
arg0_vaporizzare_1Protoagent
Human/Instrument
arg0_vaporizzare_1Protoagent
Human/Instrument
arg1_vaporizzare_1Protopatient
+liquid
arg1_vaporizzare_1Protopatient
+liquid
arg2_vaporizzare_1Location
Concrete_entity
arg2_vaporizzare_1Location
Concrete_entity
from Nilda Ruimy
N. Calzolari 27Dottorato, Pisa, Maggio 2009
Semantic entry
ontological type ontological type
event type
domain information
qualia features
Extended Qualia Structure
regular polysemy
predicative representation
semantic type: Cause_change_of_statesupertype: Cause_relational_change
semantic type: Cause_change_of_statesupertype: Cause_relational_change
eventype: transitioneventype: transition
biomedicinebiomedicine
agentive_cause: yesresulting_state: yes
agentive_cause: yesresulting_state: yes
formal: Usem79678regulate isa Usem64875processconstitutive: =====agentive: =====telic: =====
formal: Usem79678regulate isa Usem64875processconstitutive: =====agentive: =====telic: =====
regular polysemy: =====regular polysemy: =====
semantic predicate: PRED_regulate-1type of link: master
arguments description: • range • semantic role • select. restrictions
semantic predicate: PRED_regulate-1type of link: master
arguments description: • range • semantic role • select. restrictions
arg0_regulate_1Protoagent
Natural_Substance
arg0_regulate_1Protoagent
Natural_Substance
arg1_regulate_1Protopatient
Natural_Substance
arg1_regulate_1Protopatient
Natural_Substance
USem79678regulate
free definitionfree definitionregulation of a function or a physiological process
regulation of a function or a physiological process
exampleexample IL2 negatively regulates IL7IL2 negatively regulates IL7
semantic relations
synonymy: =====morpho. derivation: =====
synonymy: =====morpho. derivation: =====
from Nilda Ruimy
N. Calzolari 28Dottorato, Pisa, Maggio 2009
Semantic entry
ontological type
event type
domain information
qualia features
Extended Qualia Structure
regular polysemy
predicative representation
semantic type: Diseaseunification_path: [Phenomenon | Agentive]
semantic type: Diseaseunification_path: [Phenomenon | Agentive]
eventype: =====eventype: =====
Ear-Nose-ThroatEar-Nose-Throat
agentive_cause: yesagentive_cause: yes
USemTH31676parotite isa USem3868malattiaUSemTH31676parotite affects USem1788ghiandolaUSemTH31676parotite causes Usem72131gonfioreUSemTH31676parotite caused_by USem1971virusUSemTH31676parotite typical_of USem3593bambino
USemTH31676parotite isa USem3868malattiaUSemTH31676parotite affects USem1788ghiandolaUSemTH31676parotite causes Usem72131gonfioreUSemTH31676parotite caused_by USem1971virusUSemTH31676parotite typical_of USem3593bambino
regular polysemy: =====regular polysemy: =====
UsemTH31676parotite
free definitionInfiammazione delle ghiandole parotidi
Infiammazione delle ghiandole parotidi
example il bambino ha una parotiteil bambino ha una parotite
semantic relations
USemTH31676parotite synonymy USem79528orecchioneUSemTH31676parotite synonymy USem79528orecchione
==========
from Nilda Ruimy
N. Calzolari 29Dottorato, Pisa, Maggio 2009
SYNU_regulateV
Syntactic entry
verbauxiliary: havepassivization: +
verbauxiliary: havepassivization: +
P0 : subject mandatory NP
P0 : subject mandatory NP
head properties
subcategorization frameP1 : object mandatory NP
P1 : object mandatory NP
NF-AT positively regulates IL2, which negatively regulates IL7
USem79678regulate
USem79678regulate link to Semantic Unit
syntacticarguments
from Nilda Ruimy
N. Calzolari 30Dottorato, Pisa, Maggio 2009
domain
semant. class
a. head properties
b. subcat. frame
positionsynt. restr.
syntactic structure 1
ontological type Corresp. SynU-SemU
event type
semant. features
semant. relations
Extended Qualia Structure
regular polysemysem. restr.
argumentspredicate predicative represent.
Corresp. Syntax-Semantics
type of link
SemanticUnit
synonymy
derivation
constitutive role
formal role
telic role
agentive role
syntactic structure 2
positionsynt. restr. Frameseta. head properties
b. subcat. frame SyntacticUnit
Syntax-semantics mapping Syntax-semantics mapping (1)(1)
Syntax-semantics mapping Syntax-semantics mapping (1)(1)
from Nilda Ruimy
N. Calzolari 31Dottorato, Pisa, Maggio 2009
P0 : subject mandatory NP
P0 : subject mandatory NP
subcategorization frameid: np-v-np
P1 : object mandatory NP
P1 : object mandatory NP
predicative representation semantic predicate: PRED_regulate-1type of link: master
semantic arguments description: • range • semantic role • select. restrictions
semantic predicate: PRED_regulate-1type of link: master
semantic arguments description: • range • semantic role • select. restrictions
arg0_regulate_1Protoagent
Natural_Substance
arg0_regulate_1Protoagent
Natural_Substance
arg1_regulate_1 ProtopatientNatural_Substance
arg1_regulate_1 ProtopatientNatural_Substance
syntacticarguments
Regulate:
Syntax-Semantics mapping
S
Y
N
T
A
X
S
E
M
A
N
T
I
C
S
<Correspondence id="ISObivalent" correspargposl="ARG0-P0 ARG1-P1 "> </Correspondence>
synsemcorrespondence
from Nilda Ruimy
N. Calzolari 32Dottorato, Pisa, Maggio 2009
PRED_ aumentare_1
ARG0 : AgentEntity
ARG1 : Patient Entity
ARG2 : Undersc.Amount
SynU_aumentare_V
Transitive structure
P0 P1 P2
Intransitive structure
P0 P1Frameset
SYNTACTIC LEVEL
SEMANTIC LEVEL
SemU2_aumentareSem.Type: CHANGE_OF_VALUE
SemU1_aumentareSem.Type: CAUSE_CHANGE_OF_VALUE
‘to increase’
SEMANTIC PREDICATE
LINK PREDICATE-SEMANTIC UNIT
SYNTAX-SEMANTIC MAPPINGSYNTAX-SEMANTIC MAPPING
from N. Ruimy
N. Calzolari 33Dottorato, Pisa, Maggio 2009
PRED_ aumentare
ARG0 : Agent ARG1 : Patient
SynU_aumentare_V
Transitive structure
P0 P1 P2
Intransitive structure
P0 P1Frameset
ARG2 : Undersc.
isomorphic correspondence non-isomorphic corresp.
SemU1_aumentare SemU2_aumentare
CHANGE_OF_VALUECAUSE_CHANGE_OF_VALUE
CORRESPONDENCE SYNTACTIC-SEMANTIC FRAME
SYNTAX-SEMANTIC MAPPINGSYNTAX-SEMANTIC MAPPING
<Correspondence id="ISOtrivalent" correspargposl="ARG0-P0 ARG1-P1 ARG2-
P2"> </Correspondence>
<Correspondence id="AUG2to3erg9" comment=" Augmented mapping from TWO Position description to THREE argument description. ARG0 not represented in syntax" correspargposl="ARG1-P0 ARG2-P1"></Correspondence>
from N. Ruimy
N. Calzolari 34Dottorato, Pisa, Maggio 2009
SemU
SellSell V V
SemU
SaleSale N N
SemU
SellerSeller N N
Pred_SELLPred_SELL <ARG0>, <ARG1>,
<ARG2>, <ARG3>
Event_nounEvent_noun
Relations andRelations and PredicatesPredicates
Is_the_agent_ofIs_the_agent_of
N. Calzolari 35Dottorato, Pisa, Maggio 2009
PRED_ACCUSARE<ARG0>, <ARG1>,
<ARG2>,
accusareaccusare
accusatoaccusatorere
accusaaccusa
mastermaster
agent agent nominalisationnominalisation
process process nominalisationnominalisation
accusatoaccusato
patient patient nominalisationnominalisation
““Predicate - semantic unit(s)” Predicate - semantic unit(s)” linklink
& & RelationsRelations
““Predicate - semantic unit(s)” Predicate - semantic unit(s)” linklink
& & RelationsRelations
to accuseaccusation
accusatoraccused
from Nilda Ruimy
Is_the_agent_ofIs_the_agent_of
Event_nounEvent_noun
N. Calzolari 36Dottorato, Pisa, Maggio 2009
The SIMPLE ontologyThe SIMPLE ontologyThe SIMPLE ontologyThe SIMPLE ontology
SimpleSimple Ontology: Ontology:
multidimensional type hierarchy based on bothmultidimensional type hierarchy based on both
hierarchical and non-hierarchical conceptual hierarchical and non-hierarchical conceptual relationsrelations
from Nilda Ruimy
In the SIMPLE ontology, types are not In the SIMPLE ontology, types are not mere labels but the mere labels but the repository of a repository of a specific set of structured semantic specific set of structured semantic informationinformation
N. Calzolari 37Dottorato, Pisa, Maggio 2009
TELICAGENTIVECONSTITUTIVE ENTITY
CONCRETE_ENTITY ABSTRACT_ENTITYPROPERTY REPRESENTATION EVENTCAUSE
TOP
•Location
•Material
•Artifact
•Food
•Physical Object
•Organic Object
•Living Entity
•Substance
•PART
•GROUP
•AMOUNT
•Quality
•Psych Property
•Physi Property
•Social Property
•Domain
•Time
•Moral Standards
•Cognitive Fact
•Mvmt of Thought
•Institution
•Convention
•Abstract Location
•Language
•Sign
•Information
•Number
•Unit of measure
•Metalanguage•Human
•Animal
•Vegetal Entity
•Artifact Material
•Furniture
•Clothing
•Container
•Artwork
•Instrument
•Money
•Vehicle
•Semiotic Artifact
Aspectual
Cause Aspect.
Phenomenon
•Weather verbs
•Disease
•Stimuli
State
•Exist
•Rel. State
Act
•Non Rel. Act
•Relational Act
•Move
•Cause Act
•Speech Act
Psychological_event
•Cognitive Event
•Experience Event
Change
•Rel. Change
•Change Possession
•Change Location
•Natural Transition
•Acquire Knowledge
Cause_change
•Cause Rel. Change
•Cause Change Location
•Cause Natural Transition
•Creation
•Give Knowledge
from Nilda Ruimy
The SIMPLE ontologyThe SIMPLE ontologyM
ultid
imen
sion
ality
N. Calzolari 38Dottorato, Pisa, Maggio 2009
SemU: Identifier of the Semantic Unit Related SynU: Identifier of the Syntactic Unit the SemU is related to IWN Base Concept Number of the corresponding ItalWordNet base concept Template_Type: [Container] Unification_path [Concrete_entity | ArtifactAgentive | Telic] Domain: General Semantic Class Link to the LexiQuest (or any other ontology) Gloss: Lexicographic gloss Predicative Representation
Predicate associated to the SemU and its argument structure [container_pred (arg0)]
Arg. Selectional Restrictions
Selectional restrictions (Arg0-HeadQuantified-Substance)
Derivation: Derivational relations between SemUs Qualia_Formal: isa (1, <container> or <hyperonym>) Qualia_Agentive: created_by (1, <Usem>: [CREATION]) //definitorial// Qualia_Constitutive: made_of (1, <Usem>) //optional//
has_as_part (1, <Usem>) //optional// contains (1, <Usem>)
Qualia_Telic: used_for (1, <contain>) //definitorial// used_for (1, <measure>) //optional//
Synonymy: Synonyms of the SemU //optional// Regular Polysemy: [Amount] [Container]
Ontology of Structured Semantic Ontology of Structured Semantic Types: Types:
a Templatea TemplateSchema Schema providing a providing a set of set of structured structured information information crucial to crucial to the the definition of definition of a semantic a semantic typetype
Interface Interface between between ontology & ontology & lexiconlexicon
Guide Guide for for the the lexicographelexicographerr
N. Calzolari 39Dottorato, Pisa, Maggio 2009
Semantic typeSemantic type in the SIMPLE Ontology in the SIMPLE Ontology
Not just a label but rather a classificatory device consisting of a cluster of structured semantic information
distinguishing it by other senses of the same word
expressing its similarity with other words
Type assignment means endowing a word-sense with a structured set of semantic features and relations with a view to:
expressing its relationships to other words
drawing inferences from this information
Each semantic type is associated to a template, i.e. a schematic structure that contains a cluster of type-defining properties and imposes constraints on lexical items for type membership
Templates: interface between Ontology and Lexicon
Template-driven encoding methodology ensures internal and cross-lexicons consistency
from Nilda Ruimy
N. Calzolari 40Dottorato, Pisa, Maggio 2009
ontologicalinformation
predicative representation
extendedqualia structure
Template for the sem. typeTemplate for the sem. type ‘Instrument’ ‘Instrument’
SemU: Identifier of a SemU
SynU: Identifier of the SynU to which the SemU is linked BC Number: Number of the corresponding Base Concept in
EuroWordNet Template_Type: Instrument Template_Supertype: Semantic type which dominates the type of the SemU in the
type-hierarchy
Unification_path: [Concrete_entity | ArtifactAgentive | Telic] Domain: Domain information Semantic Class: One of WordNet Classes Gloss: Lexicographic definition Event Type: Type of event (state, process, transition) Predicative Representation:
Predicate associated with the SemU, and its argument structure
Selectional Restr.: Selectional restrictions on the arguments Derivation: Derivational relations between SemUs Formal: Usem_1 isa Usem_2 [Artifact] Agentive: Usem_1 created_by Usem_2 [Creation] Constitutive: Usem_1 made_of Usem_2 [Substance] OPTIONAL
Usem_1 has_as_part Usem_2 [Artifact] OPTIONAL Telic: Usem_1 used_for Usem_2 [Event] Synonymy: Synonyms of the SemU
Collocates: Collocate information Complex: Polysemous class of the SemU
from Nilda Ruimy
N. Calzolari 41Dottorato, Pisa, Maggio 2009
TopTop
FormalFormal ConstitutiveConstitutive AgentiveAgentive TelicTelic
Is_aIs_a Is_a_part_ofIs_a_part_of PropertyProperty
ContainsContains
Created_byCreated_by Agentive_causeAgentive_cause Indirect_telicIndirect_telic PurposePurpose
InstrumentalInstrumental
Is_the_habit_ofIs_the_habit_ofUsed_forUsed_for Used_asUsed_as
... ...
The targets of relations identify:
prototypical semantic information associated with a SemUprototypical semantic information associated with a SemU
elements of dictionary definitions of SemUselements of dictionary definitions of SemUs
typical corpus collocates of the SemUtypical corpus collocates of the SemU
100 Rels.100 Rels.
....
ActivityActivity.... ....
For a BioLexicon
For a BioLexicon
N. Calzolari 42Dottorato, Pisa, Maggio 2009
Qualia StructureQualia Structure
Consists of four qualia roles encoding orthogonal dimensions of meaning :
formal role (general identification)
constitutive role (composition)
agentive role (origin)
telic role (function)
One of the four levels of semantic representation in the theory of Generative Lexicon
N. Calzolari 43Dottorato, Pisa, Maggio 2009
isaantonym_compantonym_gradmult_opposition
result_ofagentive_progagentive_causeagentive_experiencecaused_bysource
AGENTIVE
ARTIFACTUAL
AGENTIVE
created_byderived_from
made_ofis_a_follower_ofhas_as_memberis_a_member_ofhas_as_partinstrumentkinshipis_a_part_ofresulting_staterelatesuses
CONSTITUTIVE
causesconcernsaffectsconstitutive_activitycontains has_as_colourhas_as_effecthas_as_propertymeasured_bymeasuresproducesproduced_by property_ofquantifiesrelated_tosuccessor_ofprecedestypical_ofcontainsfeeling
P
R
O
P
E
R
T
Y
is_inlives_intypical_location
LOCATION
Formal Constitutive Agentive Telicused_forused_asused_byused_against
TELIC
INSTRUMENTAL
DIRECT
TELIC
indirect_telicpurpose
object_of_activity
is_the_activity_ofis_the_ability_ofis_the_habit_of
ACTIVITY
ExtendedExtended Qualia Structure Qualia Structure
proiettile, colpire
bisturi, chirurgo
medico, curare
disgusto, provare
casa, costruire
mohair, capra pane, farina
senatore, senato
manubrio, bicicletta
projectile, hit
lancet, surgeon
doctor, cure
disgust, feel
house, build
mohair, goat bread, flour
senator, senate
handlebar, bicycle
regulatesis_regulated_by …..
N. Calzolari 44Dottorato, Pisa, Maggio 2009
is_aantonym_compantonym_gradmult_opposition
result_ofagentive_progagentive_causeagentive_experiencecaused_bysourcecreated_byderived_from
AGENTIVE
ARTIFACTUAL
AGENTIVE
CONSTITUTIVE
P
R
O
P
E
R
T
Y
LOCATION
Formal Constitutive Agentive Telicused_forused_asused_byused_against
TELIC
INSTRUMENTAL
DIRECT
TELIC
indirect_telicpurpose
object_of_activity
is_the_activity_ofis_the_ability_ofis_the_habit_of
ACTIVITY
regulatesis_regulated_by …..
“Extended” Qualia
Structure
T-cell, Blood Stem Cell
Ribose, Nucleotide
Catalyze, Enzyme
NEW!
made_ofis_a_follower_ofhas_as_memberis_a_member_ofhas_as_partinstrumentkinshipis_a_part_ofresulting_staterelatesusescausesconcernsaffectsconstitutive_activitycontains has_as_colourhas_as_effecthas_as_propertymeasured_bymeasuresproducesproduced_by property_ofquantifiesrelated_tosuccessor_ofprecedestypical_offeeling is_inlives_intypical_location
N. Calzolari 45Dottorato, Pisa, Maggio 2009
recipienterecipientedi legnodi legnofattofatto
che serve per la conservazione e il trasportoche serve per la conservazione e il trasporto
Formal: isa Constitutive: made_of
Agentive: created_by
Constitutive:contains
Telic:used_for
di doghe arcuate tenute unite da cerchi di ferrodi doghe arcuate tenute unite da cerchi di ferro
Constitutive: made_of
di liquidi, specialmente vinodi liquidi, specialmente vino
bottebottebottebottebarrel
traditional dictionary definition
Meaning dimensions expressed Meaning dimensions expressed by by
Qualia relationsQualia relations
Meaning dimensions expressed Meaning dimensions expressed by by
Qualia relationsQualia relations
from Nilda Ruimy
N. Calzolari 46Dottorato, Pisa, Maggio 2009
volareused_for
used_for
aeroplano
part_of
uccellopart_ofedificio
part_of
Ala
SemU: 3232Type: [Part]Parte di aeroplano
SemU: 3268Type: [Part]Parte di edificio
SemU: D358Type: [Body_part]Organo degli uccelli
SemU: 3467Type: [Role]Ruolo nel gioco del calcio
giocatoreisa
agentive
fabbricareagentive
squadra
member_of
……by using Lexical Resources by using Lexical Resources
Multidimensional Knowledge Bases Multidimensional Knowledge Bases
N. Calzolari 47Dottorato, Pisa, Maggio 2009
Semantic Semantic Multidimensionality Multidimensionality
& NLP& NLPNLP tasks (IE, WSD, NP Recognition, etc.) need to
access multidimensional aspects of word multidimensional aspects of word meaningmeaning:
Extended Qualia RelationsExtended Qualia RelationsIs_a_part_ofIs_a_part_of
Member_ofMember_of
TelicTelic
Made_ofMade_of
la pagina del libro (the page of the book)
il difensore della Juventus (Juventus fullback)
il suonatore di liuto (the lute player)
il tavolo di legno (the wooden table)
N. Calzolari 48Dottorato, Pisa, Maggio 2009
duna di sabbia
bicchiere di birra
fetta di pane
made_of
is_a_part_of
contains
?
?
?
Nilda Ruimy
ONTOLOGY
……..
SUBSTANCE
ARTIFACTUAL_DRINK ……….
liquid
DisambiguationDisambiguation = = Interpretation of Interpretation of conceptual conceptual relations in contextrelations in context
from Nilda Ruimy
N. Calzolari 49Dottorato, Pisa, Maggio 2009
mangiarmangiaree
Used_forUsed_forObject_of_thObject_of_the_e_aactivityaactivity
man
gia
rem
an
gia
re
man
gia
rem
an
gia
re
tavolatavola
FURNITUREFURNITURE
forchettaforchetta
posataposata
INSTRUMENTINSTRUMENT
ristoranteristorante
BUILDINGBUILDING
cucinare
cucinare
cuocere
cuocere
mestolomestolo
pentolapentola
CONTAINERCONTAINER
mangia
mangia
rere
friggere
friggere
friggitricefriggitrice
bollitorebollitore
bollire
bollire
pes
cepes
ce
pescierapesciera
Is_the_activity_of
Is_the_activity_of
cuococuoco
PROFESSIONPROFESSION
cucin
are
cucin
arem
angi
are
man
giar
e
man
giar
e
man
giar
em
angia
re
man
giar
e
man
gia
rem
angia
re
coniglioconiglio
carnecarne
melamela
carotacarota
arrostoarrosto
man
gia
rem
an
gia
re
ARTIFACT _FOODARTIFACT _FOOD
VEGETABLESVEGETABLES
FRUITFRUITFOODFOOD
SUBSTANCE_FOODSUBSTANCE_FOOD
+edible+edible
zuccherozucchero
alloroalloro
tartufotartufo
VEGETAL_ENTITYVEGETAL_ENTITY
FLAVOURINGFLAVOURING
NATURAL_SUBSTANCENATURAL_SUBSTANCE
AGENTIVEAGENTIVE
TELICTELIC
Created_byCreated_by
cucinarecucinare
cuocerecuocerearrostirearrostirebollirebollire
lessarelessarestufarestufare
friggere friggere rosolarerosolaregrigliaregrigliare
…………
Domain - Semantic classDomain - Semantic classDomain - Semantic classDomain - Semantic class
from Nilda Ruimy
N. Calzolari 50Dottorato, Pisa, Maggio 2009
Noun Compounds/Complex Nominals …are Noun Compounds/Complex Nominals …are pervasivepervasive
There is a motivation in most N+N constructionThere is a motivation in most N+N construction:: the context provides itthe context provides it
The The FrameNetFrameNet ( (SIMPLESIMPLE) way) way appeal to appeal to specific frame structuresspecific frame structures ( (qualia qualia
structuresstructures) ) associated with the head nounassociated with the head noun, , determine from corpus attestations determine from corpus attestations which which
frame elementsframe elements ( (qualiaqualia) can get instantiated ) can get instantiated as a modifier wordas a modifier word
““container”:container”: complex nominals can specify:complex nominals can specify:• material material (aluminium c., glass c., …)(aluminium c., glass c., …)• contents contents (food c., trash c., …)(food c., trash c., …)• size size (3 quart c., …)(3 quart c., …)• function function (shipping c., storage c., …)(shipping c., storage c., …)• ......
N. Calzolari 51Dottorato, Pisa, Maggio 2009
Noun Compounds/Complex NominalsNoun Compounds/Complex Nominals& multidimensional semantic approaches& multidimensional semantic approaches
a.a. FrameNetFrameNet
““ContainerContainer”” Frame Structure Frame Structure: : Frame ElementsFrame Elements:: Material:Material: aluminum container, glass c., metal c., tin c.aluminum container, glass c., metal c., tin c. Contents:Contents: food container, beverage c., trash c., water c., milk c., fuel c.food container, beverage c., trash c., water c., milk c., fuel c. Size:Size: 3 quart container3 quart container Function:Function: shipping container, storage c.shipping container, storage c.
b.b. SIMPLESIMPLE
Qualia RelationsQualia Relations of of ""containercontainer"" as as used in compounds: Constitutive:Constitutive: made_ofmade_of [MATERIAL] [MATERIAL] aluminum container, glass c., metal aluminum container, glass c., metal
c., tin c.c., tin c. Telic:Telic: containscontains [ENTITY] [ENTITY] food container, beverage c., trash c., water food container, beverage c., trash c., water
c., milk c., fuel c.c., milk c., fuel c. Constitutive:Constitutive:sizesize [QUANTITY] [QUANTITY] 3 quart container3 quart container Telic:Telic:is_used_foris_used_for [EVENT] [EVENT]shipping container, storage c.shipping container, storage c.
N. Calzolari 52Dottorato, Pisa, Maggio 2009
E.g. E.g. knife (coltello)knife (coltello) triggers:triggers: aa “cutting frame” (FrameNet) “cutting frame” (FrameNet) specific (SIMPLE) dimensions of meaningspecific (SIMPLE) dimensions of meaning
SIMPLE Extended Qualia structureSIMPLE Extended Qualia structurefor the interpretation of the semantic relation betw. Ns for the interpretation of the semantic relation betw. Ns
(internal relational structure of MWE)(internal relational structure of MWE)
butcher’s knifebutcher’s knife (coltello (coltello dada macellaio) macellaio) TELIC TELIC (used_by)(used_by) Y [Human] Y [Human] PPdaPPda
plastic knifeplastic knife (coltello (coltello didi plastica) plastica) CONST CONST (made_of)(made_of) X [Material] X [Material] PPdiPPdi
table knifetable knife (coltello (coltello dada tavola) tavola) TELIC TELIC (used_in)(used_in) Z [Location]Z [Location] PPdaPPda
hunting knifehunting knife (coltello (coltello dada caccia) caccia) TELIC TELIC (used_in_activity)(used_in_activity) E[Activity] E[Activity] PpdaPpda
piatto piatto didi legno legno CONST CONST (made_of)(made_of) X X [Material] [Material] PPdiPPdipiatto piatto didi pasta pasta CONST CONST (contains)(contains) X X [Food][Food] PPdiPPdi
Complex NominalsComplex Nominals
PPPPdisambigdisambig..
PPPPdisambigdisambig..
N. Calzolari 53Dottorato, Pisa, Maggio 2009
Deverbal nominalisation:Deverbal nominalisation:
o nounnoun murdermurder ( (uccisione, delitto, omicidiouccisione, delitto, omicidio (different sem. pref.(different sem. pref.)) PPdiPPdi
PPda_parte_di, diPPda_parte_di, di
o verbverb murdermurder ( (uccidereuccidere)) subj:NP subj:NP
obj:NP obj:NP
:instr: PPcon [:instr: PPcon [WeaponWeapon] ] ((knife m., knife m., concon coltello coltello))
:means: PPper [:means: PPper [ActionAction] ] ((strangulation m., strangulation m., perper strangolamento strangolamento))
:loc: Ppploc|di [:loc: Ppploc|di [LocationLocation] ] ((Kent State murders, Kent State murders, nelnel ... ...))
:time: Ppptime|di [:time: Ppptime|di [TimeTime] ] ((1983 murders, 1983 murders, del del 19831983))
SIMPLE: SIMPLE: possible possible extensionextension
As if it were As if it were a Situationa Situation
PREDPRED: : MURDER MURDER ((uccidereuccidere))
ARG1ARG1: agent : agent [Hum/Anim?][Hum/Anim?]
ARG2ARG2: patient : patient [Hum/Anim?][Hum/Anim?]MOD1MOD1: instr : instr [Weapon][Weapon]
MOD2MOD2: means : means [Action][Action]
MOD3MOD3: ... : ... […][…]
N. Calzolari 54Dottorato, Pisa, Maggio 2009
Ontologisation of SIMPLE Automatically converting and enriching a
computational lexicon into a formal Ontology
For NLP semantic tasks
Potential of ontologies in NLP as Backbone in LKBs
Pivot in multilingual architectures (e.g. KYOTO)
Reasoning capabilities
Ontologisation of SIMPLE into OWL
Conversion of the SIMPLE ontology
Bottom-up enrichment: promoting lexicon knowledge to
the ontology level
Language independent knowledge from Italian lexico-
semantic information from Antonio Toral
N. Calzolari 55Dottorato, Pisa, Maggio 2009
Named Entity Repository
Automatically build LRs from existing LRs and
Web 2.0 semi-structured resources. Combine:
Authoritative lexicographic experience → precision
Collaborative “wisdom of the crowds” → recall
Case study: Multilingual NE repository from
LRs (en WN, es WN, it SIMPLE) & Wikipedia
NEs linked to three LRs and two ontologies (SUMO,
SIMPLE)
Interoperable resource: LMF compliant
Applied to cross-lingual QA (validate answers): prec.
+16,3%
from Antonio Toral
N. Calzolari 56Dottorato, Pisa, Maggio 2009
Different PoS may realise an event: verbs, nouns, adjectives, prep. phrasesThe SIMPLE Lexicon helps in identifying & classifying Events (eventive nouns & adjectives) → in a 10K Words Annotation Experiment
each event is associated with an Ontological Type
the Event-Type from the SIMPLE-Ontology can be used as default value to provide event composition, and consequently to instantiate a temporal representation for each Event
improvement both in identification & classification of Events by annotators: 81.17% accuracy (vs.72.35%) and K-coefficient = 0.84 (vs. 0.7)
Morpho-SyntacticAnalysis
SIMPLE Lexicon Event Detection &Classification
Use of SIMPLE Lexicon & Ontologyfor Time and Event detection/annotation
from Tommaso Caselli
N. Calzolari 57Dottorato, Pisa, Maggio 2009
Mapping SIMPLE Semantic Types to Mapping SIMPLE Semantic Types to TimeML ClassesTimeML Classes
from Tommaso Caselli
N. Calzolari 58Dottorato, Pisa, Maggio 2009
GLML – Generative Lexicon Markup Language with James Pustejovsky, Olga Batiukova, Anna Rumshisky, Marc Verhagen
Annotating texts with Argument Selection, Argument Coercion, & Qualia Roles
The corpus brings reality to the model, provides statistical cues to improve language models
Lexical semantic info, like type coercion/selection, required for applications such as WSD, categorisation, IR (query reformulation, filtering…), IE (coreference resolution, relation extraction…), entailment, ..
Predicate – Argument Predicate – Argument constructionsconstructions
Predicate Sense DisambiguationPredicate Sense Disambiguation Argument selection: type Argument selection: type
selection /coercionselection /coercion Qualia role/relation selectionQualia role/relation selection
Modification constructions• Noun Sense Disambiguation • Qualia role/relation selection in
Adjectival Modification• Qualia role/relation selection in
Nominal Modification
Complex Types• Type selection in modification of Dot
Objectsfrom Valeria Quochi
N. Calzolari 59Dottorato, Pisa, Maggio 2009
Using Existing Resources for Using Existing Resources for ItalianItalian
SIMPLE Lexicon&Ontology/ItalWordNetSIMPLE Lexicon&Ontology/ItalWordNet Sense DisambiguationSense Disambiguation Type selection /coercionType selection /coercion Type selection in Dot ObjectsType selection in Dot Objects
59
SIMPLE Extended Qualia StructureSIMPLE Extended Qualia StructureSelection of Qualia roles/relations., e.g.Selection of Qualia roles/relations., e.g.
Constitutive Relations
e.g Is_a_part_of , Is_a_member_of
Telic Relations
e.g. Purpose, Object_of_the_activity
Agentive Relations
e.g. Source, Result_of from Valeria Quochi
N. Calzolari 60Dottorato, Pisa, Maggio 2009
Ontology & Ontology & LexiconLexicon
Today we can easily say that Today we can easily say that ontology learningontology learning, i.e. the practical , i.e. the practical feasibility of supporting knowledge acquisition in a domain, feasibility of supporting knowledge acquisition in a domain, depends on developing depends on developing automatic methods for acquiring automatic methods for acquiring conceptual representations from natural language textconceptual representations from natural language text
Semantic Web initiatives are also focussing on the building of Semantic Web initiatives are also focussing on the building of ontological representations from texts, and in this respect show a ontological representations from texts, and in this respect show a large amount of conceptual large amount of conceptual overlap with the notion of a overlap with the notion of a dynamic lexicondynamic lexicon
Based on various experiences, and as a work strategy for Based on various experiences, and as a work strategy for lexical/textual resourceslexical/textual resources
We should push towards We should push towards innovative types of lexiconsinnovative types of lexicons: a : a sort of sort of ‘example-based living lexicons’‘example-based living lexicons’ that participate that participate of properties of both lexicons and corporaof properties of both lexicons and corpora
In such a lexicon In such a lexicon redundancyredundancy is not a problem, but is not a problem, but rather a benefitrather a benefit
Lexicon & CorpusLexicon & Corpus
N. Calzolari 61Dottorato, Pisa, Maggio 2009
Often a gap between advancement in LRs and Often a gap between advancement in LRs and
LTLT Either adequate LRs are missing … or there Either adequate LRs are missing … or there
are are no systems able to use “knowledge no systems able to use “knowledge intensive” LRs effectivelyintensive” LRs effectively
Shortcomings: Shortcomings: lack of usable implementations fully exploiting lack of usable implementations fully exploiting
new types of LRsnew types of LRs LR claims are not empirically evaluated LR claims are not empirically evaluated
BUT… Mismatch between LRs and LT
A A parallel evolutionparallel evolution of R&D for both LRs and LT of R&D for both LRs and LT is neededis needed
N. Calzolari 62Dottorato, Pisa, Maggio 2009
Phenomena to be Phenomena to be represented/What is missing?? represented/What is missing??
from Ed Hovyfrom Ed Hovy
1. 1. Bracketing / grouping of predicationsBracketing / grouping of predications around entities around entities (basic frame structure) (basic frame structure)
2. 2. Concepts:Concepts: Choice of meaning/sense, with frames in some cases Choice of meaning/sense, with frames in some cases Definition and nature of concept repository / ontology Definition and nature of concept repository / ontology Major high-level concept groupings and classes Major high-level concept groupings and classes
3. 3. Labels on (dependency) arcsLabels on (dependency) arcs (thematic roles, types of (thematic roles, types of attributes, modifiers, etc.) attributes, modifiers, etc.)
4. 4. Coreference (explicit and indirect):Coreference (explicit and indirect): intra-sentential intra-sentential intersentential and cross-documents intersentential and cross-documents
5. 5. Information Structure and Discourse structure:Information Structure and Discourse structure: theme-rheme and topic-focus theme-rheme and topic-focus salience salience coordination coordination nonsemantic inter-clausal relations (RST’s interpersonal ones)nonsemantic inter-clausal relations (RST’s interpersonal ones) etc. etc.
dondonee
dondonee
donedone????
donedone????
N. Calzolari 63Dottorato, Pisa, Maggio 2009
Phenomena to be represented/ What is Phenomena to be represented/ What is missing??missing?? Ed HovyEd Hovy 6. 6. Pragmatics:Pragmatics:
Speech Acts Speech Acts Participants and audience modeling Participants and audience modeling Modality: Modality:
Epistemic modalities Epistemic modalities Deontic modalities Deontic modalities Personal attitudes Personal attitudes
Deixis / reference to external world (or databases) Deixis / reference to external world (or databases) Social register, genre, and style Social register, genre, and style
7. 7. PolarityPolarity (including scoping) (including scoping) 8. 8. MicrotheoriesMicrotheories (many of them to be incorporated (many of them to be incorporated
elsewhere) elsewhere) Time Time (Reichenbach)(Reichenbach) Space (OWL upper ontology of space, etc.) Space (OWL upper ontology of space, etc.) Cardinality Cardinality Quantification Quantification Manner Manner Degree and comparison Degree and comparison Possession Possession Existentials Existentials Copular constructions Copular constructions Conditionals Conditionals Consequences and inference Consequences and inference Co-text and intertextuality (including formatting and Co-text and intertextuality (including formatting and
other media) other media) Meaning of prosody and other speech-related effects Meaning of prosody and other speech-related effects
donedone????
donedone????
Towards a Towards a common encoding policy???common encoding policy???
N. Calzolari 64Dottorato, Pisa, Maggio 2009
Lexicon and Corpus:Lexicon and Corpus:a multi-faceted interactiona multi-faceted interaction
Lexicon and Corpus:Lexicon and Corpus:a multi-faceted interactiona multi-faceted interaction
L L C C taggingtagging C C L L frequencies (of different linguistic “objects”)frequencies (of different linguistic “objects”) C C L L proper nouns, acronyms, …proper nouns, acronyms, … L L C C parsing, chunking, …parsing, chunking, … C C L L training of parserstraining of parsers C C L L lexicon updatinglexicon updating C C L L “collocational” data (MWE“collocational” data (MWE, idioms, gram. patterns ...), idioms, gram. patterns ...) C C L L “nuances” of meanings & semantic clustering“nuances” of meanings & semantic clustering C C L L acquisition of lexical (syntactic/semantic) knowledgeacquisition of lexical (syntactic/semantic) knowledge L L C C semantic tagging/word-sense disambiguation semantic tagging/word-sense disambiguation
(e.g. in Senseval)(e.g. in Senseval) C C L L more semantic information on LEmore semantic information on LE C C L L corpus based computational lexicographycorpus based computational lexicography C C L L validation of lexical modelsvalidation of lexical models C C L L …… L L C C ......
N. Calzolari 65Dottorato, Pisa, Maggio 2009
… … Dynamic lexiconsDynamic lexicons Current computational lexicons (even WordNets) are Current computational lexicons (even WordNets) are
static objectsstatic objects, still shaped on traditional dictionaries , still shaped on traditional dictionaries
Towards a Towards a flexible model of dynamic lexiconflexible model of dynamic lexicon extending the expressiveness of a core static lexicon extending the expressiveness of a core static lexicon adapting to the requirements of language in use as attested adapting to the requirements of language in use as attested
in corporain corpora with semantic clustering techniques, etc.with semantic clustering techniques, etc.
Convert the extreme flexibility & multidimensionality Convert the extreme flexibility & multidimensionality of meaning into of meaning into
large-scale and exploitable (VIRTUAL?) resourceslarge-scale and exploitable (VIRTUAL?) resources
a “Lexicon & Corpus” togethera “Lexicon & Corpus” togetherSort ofSort of Example-based LexiconExample-based Lexicon
BUTBUT
N. Calzolari 66Dottorato, Pisa, Maggio 2009
Verb/Arguments InteractionVerb/Arguments Interaction at the Lexical-Semantic Levelat the Lexical-Semantic Level
Verb meaning Verb meaning determines/selects the determines/selects the ‘sense’ of its subject and/or direct object‘sense’ of its subject and/or direct object
e.g. e.g. arrestarearrestare, both , both ‘to arrest’‘to arrest’ & & ‘to stop’‘to stop’, selects direct , selects direct objects which have themselves, or receive from the verb, a objects which have themselves, or receive from the verb, a negative connotationnegative connotation
DobjDobj Sem.type Sem.type Conn.Feat.Conn.Feat.
o ladro1ladro1 agent_temp_actagent_temp_act negnego spacciatore1spacciatore1 agent_temp_actagent_temp_act negnego trafficante1trafficante1 agent_temp_actagent_temp_act negnego traffico 2traffico 2 actact negnego invasione1invasione1 cause_actcause_act negnego massacro1massacro1 cause_nat_transcause_nat_trans negnego inflazione1inflazione1 eventevent negnego pregiudicato1pregiudicato1 humanhuman negnego balordo1balordo1 humanhuman neg nego maniaco1maniaco1 humanhuman neg nego strozzino 1strozzino 1 agent_temp_actagent_temp_act negneg
N. Calzolari 67Dottorato, Pisa, Maggio 2009
Complexity of Word Sense in Complexity of Word Sense in context: context:
many potential cluesmany potential cluesA particular meaningA particular meaning (of a verb) may be (of a verb) may be selected selected by:by:
A A specific syntactic patternspecific syntactic pattern comprenderecomprendere + + that-clausethat-clause = ‘to understand’ [not = ‘to include’] = ‘to understand’ [not = ‘to include’] aprireaprire + + PP introduced by PP introduced by aa (preferably with “human” head) = ‘to be ready, (preferably with “human” head) = ‘to be ready,
open, well disposed towards someone’ (e.g. open, well disposed towards someone’ (e.g. Cossiga apre a La MalfaCossiga apre a La Malfa)) The The semantic type of subjects, dir objects, ind. objectssemantic type of subjects, dir objects, ind. objects
human human subject (if not collective type) always selects the meaning ‘to subject (if not collective type) always selects the meaning ‘to understand’ of the verb understand’ of the verb comprenderecomprendere
The The domain of usedomain of use perseguire un reatoperseguire un reato ‘to prosecute a crime’ ( ‘to prosecute a crime’ (domaindomain=law=law))
A A specific modifierspecific modifier perseguire penalmente ‘perseguire penalmente ‘to prosecute at the penal level’,to prosecute at the penal level’, not ‘to pursue (a not ‘to pursue (a
goal)’goal)’ comprendere benissimocomprendere benissimo ‘ ‘to understand very well’, not ‘to include’to understand very well’, not ‘to include’
Two Two different senses of a lemma cannot be selected simultaneouslydifferent senses of a lemma cannot be selected simultaneously in the same context in the same context
BUT…BUT…
N. Calzolari 68Dottorato, Pisa, Maggio 2009
Complexity of Word Sense Complexity of Word Sense identificationidentification
The problem: The problem: not sure testsnot sure tests only partial validity & not completely discriminatingonly partial validity & not completely discriminating
Moreover, it’s Moreover, it’s not easy to predict when to apply which testnot easy to predict when to apply which test
Word Sense Disambiguation (WSD)Word Sense Disambiguation (WSD) in different contexts is better achieved in different contexts is better achieved using info using info
types at different levels of linguistic descriptiontypes at different levels of linguistic description::
morphosyntactic/syntactic/semantic/pragmatic…, even morphosyntactic/syntactic/semantic/pragmatic…, even multilingualmultilingual
BUT BUT a-priori unpredictable where is the a-priori unpredictable where is the “clue”“clue”
N. Calzolari 69Dottorato, Pisa, Maggio 2009
Complexity of Word Sense & use of Complexity of Word Sense & use of CorporaCorpora
The availability of large quantities of The availability of large quantities of semantically tagged corporasemantically tagged corpora helps to helps to analyse the analyse the impact of different “clues”impact of different “clues” to to
perform WSDperform WSD in different contexts in different contexts study the study the interaction of cluesinteraction of clues belonging to belonging to
different levels of linguistic descriptiondifferent levels of linguistic description , to , to improve WSD strategies improve WSD strategies
not just statistics!!not just statistics!!
Automatically acquire Automatically acquire syntactic, semantic, syntactic, semantic, collocational (lexical) ‘indicators’collocational (lexical) ‘indicators’ which can help in the identification of a word-sensewhich can help in the identification of a word-sense
‘‘List’ them in the lexicon??List’ them in the lexicon??
N. Calzolari 70Dottorato, Pisa, Maggio 2009
Problem of regular polysemyProblem of regular polysemy … and more … and more
BUT…BUT… actual occurrence of actual occurrence of “two senses”“two senses” in the in the samesame contextcontext……
e.g. e.g. bothboth act & result (for deverbal nouns, etc.) act & result (for deverbal nouns, etc.)
In una In una comunicazione comunicazione al Parlamento la Commissione ha al Parlamento la Commissione ha illustrato le sue riflessioni su …illustrato le sue riflessioni su …
Berlusconi dovrà scegliere se fare l’uomo di governo o Berlusconi dovrà scegliere se fare l’uomo di governo o mantenere il mantenere il controllocontrollo delle delle sue tvsue tv
Underspecified meaningsUnderspecified meanings?? maybe subsuming more granular distictions, maybe subsuming more granular distictions, to be used to be used
only when disambiguation is feasible/only when disambiguation is feasible/useful useful in a contextin a context
Theoretical languageTheoretical language, , “invented”“invented” by by lexicographers/linguists who have/want to classify in lexicographers/linguists who have/want to classify in disjoint classes, disjoint classes, vsvs..
actual usageactual usage a a “continuum”“continuum” resistant to clear-cut disjunctions resistant to clear-cut disjunctions
by necessity ambiguous wrt imposed classificationsby necessity ambiguous wrt imposed classifications
N. Calzolari 71Dottorato, Pisa, Maggio 2009
… … what cannot be easily encodedwhat cannot be easily encoded at the Lexical-Semantic Levelat the Lexical-Semantic Level
In a “Senseval” framework …In a “Senseval” framework …
When sense interpretation requires appeal to When sense interpretation requires appeal to extra-linguistic knowledgeextra-linguistic knowledge ( (not to be not to be captured at the lexical-semantic level of description)captured at the lexical-semantic level of description)
When When corpus annotationcorpus annotation either either diverges from the lexical resource or further diverges from the lexical resource or further specifies itspecifies it
words acquiring a words acquiring a specific sense, strictly dependent on the contextspecific sense, strictly dependent on the contextla donna Pauline Collins, che ha già visto arrestare il marito dai la donna Pauline Collins, che ha già visto arrestare il marito dai tedeschitedeschi,…,…
variety of nuancesvariety of nuances of a verb, e.g. according to co-occurring dir.obj. sem-type of a verb, e.g. according to co-occurring dir.obj. sem-type
metaphors extended to an entire sentencemetaphors extended to an entire sentence
l’auto l’auto verdeverde arriva sularriva sul tavolo del governotavolo del governo
(lit. the (lit. the greengreen car car arrives onarrives on the the table of the governmenttable of the government))
......
Not all these “shifts of meanings” can/must be captured Not all these “shifts of meanings” can/must be captured through lexical-semantic annotationthrough lexical-semantic annotation
e.g.e.g.
N. Calzolari 72Dottorato, Pisa, Maggio 2009
Wrt Senseval Wrt Senseval jargon, neologisms, evaluative jargon, neologisms, evaluative
suffixation, ‘titles’, …suffixation, ‘titles’, …
vetturettavetturetta minitaximinitaxi fumantino (agg. una fumantino (agg. una
persona fumantina)persona fumantina) komeinistakomeinista ……
Primula rossa (= boss Primula rossa (= boss mafioso)mafioso)
Scarpa d'oro (= un bravo Scarpa d'oro (= un bravo giocatore)giocatore)
……
Not in any lexiconNot in any lexicon……
a a semantic type easier to assignsemantic type easier to assign than a than a word-sense in a lexiconword-sense in a lexicon
N. Calzolari 73Dottorato, Pisa, Maggio 2009
Compounds and idiomsCompounds and idioms
uscire di scenauscire di scena farla francafarla franca fare fuocofare fuoco andare in ondaandare in onda …… fare [in tempo]fare [in tempo] andare [a piedi]andare [a piedi] essere [in testa]essere [in testa]
(= essere il primo)(= essere il primo) vincere [per un vincere [per un
soffio]soffio] partire [a razzo]partire [a razzo]
Croce RossaCroce Rossa Caschi BluCaschi Blu conflitto a fuococonflitto a fuoco atletica leggeraatletica leggera famiglia benefamiglia bene un bagno di follaun bagno di folla ……
Where is the boundary of the MWE? Where is the boundary of the MWE? ""andare_a_piediandare_a_piedi" vs. " vs. andareandare (Pos V) (Pos V) a_piedia_piedi (Pos Adv.loc).? (Pos Adv.loc).?
N. Calzolari 74Dottorato, Pisa, Maggio 2009
Locutions Locutions andand Figurative usages Figurative usages
per caritàper carità in questionein questione per caso per caso in lizzain lizza a volontà a volontà a buon mercatoa buon mercato …… ci mancherebbe!ci mancherebbe! c'è mancato pococ'è mancato poco ……
due lavoratori su tre due lavoratori su tre sono a sono a casacasa (= essere disoccupato) (= essere disoccupato) [the collocation with ‘[the collocation with ‘lavoratorilavoratori’ ’ disambiguates the expression]disambiguates the expression]
uomo [di polso]uomo [di polso] zona medaglia d'orozona medaglia d'oro (= tra i (= tra i
primi)primi) a cielo apertoa cielo aperto (discarica a ..) (discarica a ..) la bella vitala bella vita (fare …) (fare …) ……
If annotation of individual components, loss of the semantic If annotation of individual components, loss of the semantic contribution of the MWEcontribution of the MWE acquistare un oggetto acquistare un oggetto a buona buon (Pos A) (Pos A) mercatomercato (Pos S) !!(Pos S) !!
N. Calzolari 75Dottorato, Pisa, Maggio 2009
Usual issues: Usual issues: “Is there a fixed set of senses?”“Is there a fixed set of senses?”
or “Do senses exist as separate objects?”or “Do senses exist as separate objects?”
Criteria for sense distinctionCriteria for sense distinction very very application-dependentapplication-dependent greater vs. lesser granularitygreater vs. lesser granularity depend on the depend on the task/ domain/situationtask/ domain/situation/etc./etc. i.e. the communication purposei.e. the communication purpose
& there is & there is no inherently “true”no inherently “true” (upper or lower) limit to the granularity ... (upper or lower) limit to the granularity ...
Impossible a Impossible a “checklist theory of meaning”:“checklist theory of meaning”: meaning as a “piece meaning as a “piece
of information” with an autonomous status independent of its useof information” with an autonomous status independent of its use
Computational resources should provide Computational resources should provide multi-dimensional informationmulti-dimensional information the highest expressiveness in terms of sense-discriminating powerthe highest expressiveness in terms of sense-discriminating power contextualcontextual information information
Are we dealing with semantic annotation in the Are we dealing with semantic annotation in the right way??right way??
N. Calzolari 76Dottorato, Pisa, Maggio 2009
Divergences betw. Divergences betw.
Lexicon encoding & Corpus annotationLexicon encoding & Corpus annotation
In the In the lexicon lexicon senses are senses are “de-contextualized”“de-contextualized” (a necessity to capture (a necessity to capture generalizations) generalizations) sense discriminationsense discrimination must be kept must be kept “under control”“under control” clustering clustering (manually or automatically) (manually or automatically)
In the In the corpus sense annotationcorpus sense annotation task taskcontextualizationcontextualization plays a predominant role plays a predominant rolecalls for a range of calls for a range of pragmatic issuespragmatic issues corpus analysis per se would lead to excessivecorpus analysis per se would lead to excessive granularity of sense granularity of sense distinctionsdistinctions
Capture just the core basic distinctions in a core lexicon & Capture just the core basic distinctions in a core lexicon & Acquire Acquire additional, additional, more granularmore granular info (usu. of info (usu. of collocationalcollocational nature) nature) from corporafrom corporato be encoded within the broader senses, e.g. to help translationto be encoded within the broader senses, e.g. to help translation
not yet not yet solvedsolved
not yet not yet solvedsolved
N. Calzolari 77Dottorato, Pisa, Maggio 2009
BetweenBetween LRs LRs and and Linguistics:Linguistics:
A consequence of the A consequence of the corpus-based approachcorpus-based approach is is
Compels toCompels to break hypothesesbreak hypotheses too easily taken for granted too easily taken for granted
in mainstream linguisticsin mainstream linguistics
In actual usage a characteristics of language is to displayIn actual usage a characteristics of language is to display many many
propertiesproperties which behave which behave as a continuumas a continuum, not as “yes/no” , not as “yes/no”
propertiesproperties
The same holds true for so-called “rules”: we find more The same holds true for so-called “rules”: we find more
frequentlyfrequently “tendencies” towards a rule“tendencies” towards a rule than precise rules than precise rules
Many of the theoretical rules appear to beMany of the theoretical rules appear to be simplifications or simplifications or
idealisations idealisations in factin fact dispelled by real usage dispelled by real usage
A number ofA number of dichotomiesdichotomies must then be must then be reconciledreconciledLesson learnedLesson learned:: [IN-][IN-]Adequacy ofAdequacy of Lexical resourcesLexical resources
A long way to be able to recognise & integrate the many dimensions relevant to content interpretation
N. Calzolari 78Dottorato, Pisa, Maggio 2009
A number ofA number of “dichotomies” “dichotomies” not as opposite not as opposite
views, views,
but asbut as complementary perspectivescomplementary perspectives
A number ofA number of “dichotomies” “dichotomies” not as opposite not as opposite
views, views,
but asbut as complementary perspectivescomplementary perspectives
Language as a Language as a continuumcontinuum::
rules vs. tendenciesrules vs. tendencies absolute constraints vs. preferencesabsolute constraints vs. preferences discreteness vs. continuum/gradednessdiscreteness vs. continuum/gradedness theoretical/potential vs. actualtheoretical/potential vs. actual intuition/introspection vs. empirical evidenceintuition/introspection vs. empirical evidence theory-driven vs. data-driventheory-driven vs. data-driven symbolic vs. statisticalsymbolic vs. statistical
the right part must be highlighted,the right part must be highlighted,
then to then to combine combine the twothe two
Choices on the Choices on the syntagmatic axissyntagmatic axis are pervasive are pervasive
Lexicon & Corpus must converge
Recommended