View
219
Download
5
Tags:
Embed Size (px)
Citation preview
Linguistic SpecificationsLinguistic SpecificationsPenn, December 11 2000
What is SIMPLE?What is SIMPLE?
use of a common modelcommon model
use of a common representation languagecommon representation language
use of a common methodology of buildingcommon methodology of building the lexicon
common Template Typescommon Template Types, with default obligatory info (Type defining), and indication of optional info
A subset of the 12 Lexicons crosslingually related:A subset of the 12 Lexicons crosslingually related:
choice of a shared set of SemUs (from EWN)
A set of harmonised computational lexicons A set of harmonised computational lexicons for HLT applications, for HLT applications,
geared for multilingual linksgeared for multilingual links
SemU
MuSSynU
SemU
Sem InfoSem Info
Lexical RelSem. Rel Sem. Feat
MuSSynU
SemUSemU
Sem InfoSem Info
PAROLE – SIMPLE PAROLE – SIMPLE ArchitectureArchitecture
TEMPLATE
Semantic information in SIMPLESemantic information in SIMPLE
Word senses are encoded as Semantic UnitsSemantic Units (SemUs),(SemUs), containing the following information:
• Semantic type *Semantic type *
• Domain *Domain *
• Lexicographic gloss *Lexicographic gloss *
• Qualia structure
• Reg. Polysemy altern.
• Event type
• Derivation relations
• Synonymy
• Collocations
• Argument structure for Argument structure for predicative SemUs *predicative SemUs *
• Selection restrictions on the Selection restrictions on the arguments *arguments *
• Link of the arguments to the Link of the arguments to the syntactic subcategorization syntactic subcategorization frames (represented in the frames (represented in the PAROLE lexicons) *PAROLE lexicons) *
Some research aspects of the modelSome research aspects of the model
On a large scale, for so many languages:
multiple orthogonal dimensions of meanings (GL) multiple orthogonal dimensions of meanings (GL) for different POS, e.g.:
qualia rolesqualia roles, made up by various semantic relations/features (also from Genelex & Acquilex, but reorganised in a coherent structure): the extended qualia structureextended qualia structure
argument structure & selection preferencesargument structure & selection preferences, linked to the PAROLE syntactic frame
Providing a framework for testing and evaluating the maturity of framework for testing and evaluating the maturity of the current state-of-the-art in lexical semanticsthe current state-of-the-art in lexical semantics
Potential basis for future European multilingual initiatives for HLT applications
Semantic Multidimensionality and NLPSemantic Multidimensionality and NLP
Crucial NLP tasks (IE, WSD, NP Recognition, etc.) need to access multidimensional aspects of word meaning, represented in SIMPLE with the
Qualia RelationsQualia RelationsIs_a_part_ofIs_a_part_of
Member_ofMember_of
TelicTelic
Made_ofMade_of
la pagina del libro (the page of the book)
il difensore della Juventus (Juventus fullback)
il suonatore di liuto (the lute player)
il tavolo di legno (the wooden table)
Complexity? Complexity? a constraining, structured model is a constraining, structured model is
necessarynecessary
to enforce uniformity betw. languagesuniformity betw. languages & systematicity in encodingsystematicity in encoding
Great granularity and details in the specs (wrt the TA) implied: more work for the Specs Groupmore work for the Specs Group... a common methodology for the lexicographersa common methodology for the lexicographers, guided by
the Templates (also less waist of time)
TemplatesTemplates as a way to organise and classify relevant “clusters” of “clusters” of informationinformation for coherent encoding, across sites and languages (distributed
building of harmonised lexicons) for later use/tuning of the information in applications and tasks
SemUSemU Predicate, arguments, Predicate, arguments, Selection restrictionsSelection restrictions
Pred. LayerPred. Layer
QualiaQualia DerivationDerivation PolysemyPolysemy Event TypeEvent Type
InstantiationInstantiation
…
Italian lexiconItalian lexicon
Type Type OntologyOntology
TemplateTemplate Catalan lexiconCatalan lexicon
Danish lexiconDanish lexicon
Greek lexiconGreek lexicon
Overall OrganizationOverall Organization
TemplateTemplate for for
Semantic Semantic UnitsUnits
Conextual/Conextual/Polysemy Polysemy
InformationInformation
Qualia Qualia StructureStructure
Predicative Predicative LayerLayer
Type System Type System CoordinatesCoordinates
SemU: Identifier of a SemUSynU: Identifier of the SynU to which the SemU is linkedBC Number: Number of the corresponding Base Concept in
EuroWordNetTemplate_Type: Semantic type of the SemUTemplate_Supertype: Semantic type which dominates the type of the SemU in the
type-hierarchyUnification_path: Unification history of a template (only for unified top-types)Domain: Domain information from ERLI's domain listSemantic Class: One of WordNet Classes used by ERLIGlossa: Lexicographic definitionEvent Type: Event SortPredicativeRepresentation:
Predicate associated with the SemU, and its argumentstructure
Selectional Restr.: Selectional restrictions on the argumentsDerivation: Derivational relations between SemUsFormal: Formal relation between SemUsAgentive: Agentive relations between SemUsConstitutive: Constitutive relations between SemUs
Constitutive semantic featuresTelic: Telic relations between SemUsSynonymy: Synonyms of the SemUCollocates: Collocate informationComplex: Polysemous class of the SemU
““redundancy”redundancy”
Perception
Verb Examples: hear, smell, etc.
Noun Examples: sight, look, etc.
Linguistic Tests:Linguistic Tests:
Levin Class:Levin Class: 30.1 (See verb, e.g. detect, see, notice), 30.4 (Stimulus subject, e.g. look, smell)
Comments: Processes involving an experiencing relation, whereby the perception involves the senses of a living entity. The instrument of perception (e.g. eyes for see is encoded in the Constitutive quale).
Under this template we include both volitional (e.g. look) and non-volitional (e.g. see) events. The difference is expressed as a constitutive feature.
Template for PerceptionTemplate for PerceptionSemU: 1Usyn:BC Number: 105Template_Type: [Perception]Template_Supertype:[Psychological_event]Domain: GeneralSemantic Class: PerceptionGloss: //free//Event type: processPred _Rep.: Lex_Pred (<arg0>,<arg1>)Derivation: <Nil> or //Erli's Code//Selectional Restr.:arg0 = Animate //concept// arg1:default = [Entity] Formal: isa (1,<SemU>:[Perception]>)Agentive: <Nil>Constitutive: instrument (1, <SemU>:[Body_part]) intentionality ={yes,no} //optional//Telic: <Nil>Collocates: Collocates (<SemU1>,...<SemUn>)Complex: <Nil>
Example
SemU: <guardare_2> //look_2//
Usyn:
BC Number: 105
Template_Type: [Perception]
Template_Supertype:[Psychological_event]
Domain: General
Semantic Class: Perception
Gloss: osservare con attenzione
Event type: process
Pred _Rep.: guardare (<arg0>,<arg1>)
Derivation: <Nil>
Selectional Restr.: arg0 = Animate //concept// arg1:default = [Entity]
Formal: isa (<guardare_2>,<percepire>: [Psychological_event])
Agentive: <Nil>
Constitutive: instrument (<guardare_2>, <occhio>:[body_part])
intentionality ={yes}
Telic: <Nil>
Collocates: Collocates (<SemU1>,...<SemUn>)
Complex: <Nil>
Semantic Relations in SIMPLESemantic Relations in SIMPLE
To represent: multiple meaning dimensions in a sense- Qualia Qualia Rel.
cross-PoS relations (nominalization etc)- DerivationDerivation Rel.
regular polysemous classes - PolysemyPolysemy Rel.
collocation information - CollocationCollocation Rel.
Requirements of Flexibility & OpennessRequirements of Flexibility & Opennessan extendable framework:extendable framework: to allow coherent future extensions with additional or more specific infomultipurpose requirements: multipurpose requirements: to make it possible tuning for specific applications/text types
Modular Representation of a Semantic Unity
Semantic Relations in Semantic Relations in SIMPLESIMPLE
SemUSemUPredicate, arguments, Predicate, arguments, Selectional restrictionsSelectional restrictions
Pred. Layer
Relations between Relations between SemUsSemUs
Rel. Layer
QualiaQualia DerivationDerivation PolysemyPolysemy CollocationCollocation
TopTop
FormalFormal ConstitutiveConstitutive AgentiveAgentive TelicTelic
Is_aIs_a Is_a_part_ofIs_a_part_of PropertyProperty
ContainsContains
Created_byCreated_by Agentive_causeAgentive_cause Indirect_telicIndirect_telic ActivityActivity
InstrumentalInstrumental Is_the_habit_ofIs_the_habit_of
Used_forUsed_for Used_asUsed_as
... ...
The targets of relations identify:
prototypical semantic information associated with a SemUprototypical semantic information associated with a SemU
elements of dictionary definitions of SemUselements of dictionary definitions of SemUs
typical corpus collocates of the SemUtypical corpus collocates of the SemU
Calcina (mortar)
SemU: 3070
Type: [Artifactual_material][Artifactual_material]
White substance used as material to White substance used as material to build wallsbuild walls
<costruire>build <sostanza>
substance<materiale>
material
Isa Used_asUsed_for
Ala (wing)
SemU: 3232Type: [Part][Part]Part of an airplanePart of an airplane
<uccello>bird
<parte>part
<volare>fly
IsaSemU: 3268Type: [Part][Part]Part of a buildingPart of a building
SemU: D358Type: [Body_part][Body_part]Organ of birds for flyingOrgan of birds for flying
Used_for
Isa
Isa
<fabbricare>make
Used_for
Agentive
<edificio>building
<aeroplano>building
Is_a_part_of
Is_a_part_ofIs_a_part_of
SemU: 3467Type: [Role][Role]Role in footballRole in football
<giocatore>player
Isa
SemU
Sell V
SemU
Sale N
SemU
Seller N
Pred_SELL <ARG0>, <ARG1>,
<ARG2>, <ARG3>
Event_nounEvent_noun
Relations and Predicates in SIMPLERelations and Predicates in SIMPLE
Is_the_agent_ofIs_the_agent_of
Comprendere V
SemU: 61725
Type: [Cognitive_event][Cognitive_event]
To understandTo understand
SemU: 6962
Type: [Constitutive_state][Constitutive_state]
To includeTo include
Comprensione N
SemU: 61726
Type: [Cognitive_event][Cognitive_event]
UnderstandingUnderstanding
Comprendere#1 Comprendere#1 <Arg1 [+human]>, <Arg2 [ +semiotic]><Arg1 [+human]>, <Arg2 [ +semiotic]>
Comprendere#2Comprendere#2<Arg1 [+group]>, <Arg2><Arg1 [+group]>, <Arg2>
master
master
verb_nominalization
il difensoredifensore di Clintonil difensoredifensore della Juventus
Difensore N
SemU: 4125
Type: [Role][Role]
DefenderDefender
SemU: 3526
Type: [Role][Role]
FullbackFullback
Difendere#1Difendere#1<Arg1>, <Arg2><Arg1>, <Arg2>
agent_nominalization
<squadra>team
Is_a_member_of
Multidimensional OntologyMultidimensional Ontology
1. TELIC [Top]
2. AGENTIVE [Top]
2.1. Cause [Agentive]
3. CONSTITUTIVE [Top]
3.1. Part [Constitutive]
3.1.1. Body_part [Part]
3.2. Group [Constitutive]
3.2.1. Human_group [Group]
3.3. Amount [Constitutive]
4. ENTITY [Top]
4.1. Concrete_entity [Entity]
4.1.1. Location [Concrete_entity]
…
Usem: 1
BC number: number
Template_Type: [Part]
Template_Supertype:
[Constitutive]
Domain: General
Semantic Class: Part + <Semantic Class>
Gloss //free//
Pred_Rep.: Part_of (<arg0>)
Selectional Restr.:
arg0 = [Entity]
Derivation: <Derivational Relation>
Formal: isa (1, <part> or <hyperonym>)
Agentive: <Nil>
Constitutive: is_a_part_of (1, <Usem>: [Constitutive])
Telic: <Nil>
Synonymy: <Nil>
Collocates:Collocates (<Usem1>,...,<Usemn>)
Complex: <Nil>
SIMPLE wrt EAGLES/ISLESIMPLE wrt EAGLES/ISLEComputational Lexicon WGComputational Lexicon WG
Multilingual Lexicons Multilingual Lexicons (US-EU coop.) (US-EU coop.)
Last EAGLESLast EAGLES work on Lexicon/Semantics used used for SIMPLE specifications
SIMPLESIMPLE lexicons chosen as a basis for applying & testingbasis for applying & testing EAGLES/ISLEEAGLES/ISLE work on defining common guidelines for Multilingual LexiconsMultilingual Lexicons
Basic lexical semantic Basic lexical semantic notionsnotions
BASE CONCEPTSBASE CONCEPTS, , HYPONYMYHYPONYMY, , SYNONYMYSYNONYMY: all applications and enabling technologies
SEMANTIC FRAMESSEMANTIC FRAMES: MT, IR, IE, & Gen, Pars, MWR, WSD, Coref
COOCCURRENCE RELATIONSCOOCCURRENCE RELATIONS:: MT, Gen, Word Clust, WSD, Par
MERONYMYMERONYMY: MT, IR, IE & Gen, PNR ANTONYMYANTONYMY: Gen, Word Clust, WSD SUBJECT DOMAINSUBJECT DOMAIN: MT, SUM, Gen, MWR, WSD ACTIONALITYACTIONALITY: MT, IE, Gen, Par QUANTIFICATIONQUANTIFICATION: MT, Gen, Coref
Complementarity wrt EuroWordNetComplementarity wrt EuroWordNet
Use of a small EWN subset for all languages Mappable Top Ontology Actual linking of data for a few languages
Semantic subcategorisation and linking with syntax Template structure for the description of SemU SemU vs. Synset: basic unit Nodes in the Ontology as structured Sem. Types (bundles of
different info types)
From SENSEVAL/ROMANSEVALFrom SENSEVAL/ROMANSEVAL
Which requirements?Which requirements?
Common semantic tagset, Common semantic tagset, Gold StandardGold Standard Criteria for sense discriminationCriteria for sense discrimination (flexible & adaptable) & (flexible & adaptable) & sense-sense-
granularitygranularity Different dimensionsDifferent dimensions of meanings of meanings Different Different disambiguation cluesdisambiguation clues/strategies (interaction syntax & /strategies (interaction syntax &
semantics)semantics) Underspecified readingsUnderspecified readings (regular polysemy) (regular polysemy) MultiWordsMultiWords Metaphorical usageMetaphorical usage
Core Lexicons to be enlarged Core Lexicons to be enlarged at the National levelat the National level
PAROLE/SIMPLE start providing the common platform
For the subsidiarity concept the process started at the EU level is continued at the national level:
PAROLE/SIMPLE resources are being enlarged within National Projects (e.g. Danish, Greek, Italian, Portuguese, ...)
This creates a really large infrastructure of harmonised LR throughout a really large infrastructure of harmonised LR throughout EuropeEurope, impossible without the fundamental role played by the EC Standards and LRs projects
A major achievement in Europe, where all the difficulties of LRs building are multiplied by the language factor