Upload
caleb-malone
View
221
Download
6
Tags:
Embed Size (px)
Citation preview
The Italian CLIPS Lexicon
and its reuse in a bilingual environment
The Italian CLIPS Lexicon
and its reuse in a bilingual environment
Nilda Ruimy
ILC CNR, Pisa
september 2004
OutlineOutline
The origin of the CLIPS lexicon The PAROLE-SIMPLE model
General encoding criteria Phonological and morphological levels Syntactic level: information content The semantic lexicon Theoretical background: GL theory
The original Qualia Structure The SIMPLE ontology The Extended Qualia Structure Semantic level: information content Predicative structure Syntax-semantics mapping Encoding methodology CLIPS essential features & applications
september 2004
Part I Part II
Creating a bilingual resource The two scenarios
Scenario I Drawbacks
Scenario II The cognate approach The sense indicator approach
Results Concluding remarks
Nilda Ruimy
CLIPS: a bit of genealogyCLIPS: a bit of genealogy
CLIPSlexicon
XML format
CLIPSlexicon
XML formatmorphology: 20,000 entriessyntax: 20,000 lemmas semantics: 10,000 senses
september 2004
PAROLE Corpus
lexical units PAROLE Corpus
lexical units
DMI phonology
DMI phonology
PAROLE European project
Italy: enlargment of these core
lexicons in a national follow-up project
SIMPLE European project
phonology: 374,000 entriesmorphology: 49,000 entriessyntax: 55,000 lemmassemantics: 55,000 senses
Nilda Ruimy
12 harmonized lexicons
12 harmonized lexiconsPAROLElexicons
PAROLElexicons
SIMPLElexicons
SIMPLElexicons
Semantic Information for Multifunctional
Plurilingual Lexica
GENELEX-PAROLERepresentational Model
PAROLE-SIMPLETheoretical model
•EAGLES recommendations
•Extended GENELEX model
•Results from EU projects:
• EUROWORDNET
• ACQUILEX
• DELIS
• GENERATIVE LEXICON
The PAROLE-SIMPLE ModelThe PAROLE-SIMPLE Model
september 2004 Nilda Ruimy
common EAGLES-conformant model common representation language common building methodology
The Linguistic ModelThe Linguistic Model
InnovativeTackles misrepresented areas of knowledgeExtendible and multifunctionalMultilingual perspective
PAROLE-SIMPLE lexicons
Nilda Ruimy september 2004
REUSABILITY
Representational Model (1)Representational Model (1)
Entity/Relationship Model:
september 2004
implemented through a DTD that defines: the structure of every descriptive element the relationships holding among the various
descriptive elements as well as their co-occurence restrictions
non ridondant data representation
Nilda Ruimy
Representational Model (2)Representational Model (2)
specific representational structures for the every level of linguistic description;
september 2004
link among the different levels although the information encoded at each level is perfectly autonomous
Nilda Ruimy
september 2004
General encoding criteriaGeneral encoding criteria
Reduce the lexicographer’s margin of subjectivity by setting precise guidelines for the treatment of particular phenomena
Base as much as possible the encoding on corpus data
Find a balance between the encoding of attested structures / senses only and an exhaustive encoding including rare structures / senses as well
Nilda Ruimy
september 2004
Splitting entriesSplitting entries
Avoid both redundancy and over-powerful gatherings
Use criteria strictly relevant to the description level, e.g. at the syntactic level, syntactic-driven criteria: arity syntactic function:
disporre i libri negli scaffali / disporre di due auto complement optionality:
attraversare (la strada) (lit. sense) / attraversare un momento difficile different (non alternative) realization of complements: Leo evita Lia / L. ha evitato di guardare L., che L. si ferisse
Encode, at the semantic level, most common senses distinguished in average size dictionaries (ca.150,000 words)
Nilda Ruimy
a. head properties
b. subcat. frame
positionsynt. restr.
syntactic structure 1
Corresp. MrphU-SynU
Corresp. PhnU-MrphU
MorphologicalUnit
PoS & subcat.inflectional paradigm
PhonologicalUnit
stress positionvowel opennesscons. prononciation
syntactic structure 2
Framesetpositionsynt. restr. a. head properties
b. subcat. frame SyntacticUnit
The four-level architectureThe four-level architecture
september 2004
The first three levelsThe first three levels
Nilda Ruimy
P1
adverbialdi_PP
optional
Aumentare:
main verb
relates main syntactic frame to alternating one
aux. :avere
syntactic frame:
FRAMESET relating systematic frame alternations:
relates respective frame positions
‘to increase: The government has increased the prices by 3%. Prices have increased by 3%’Il governo ha aumentato i prezzi del 3%. I prezzi sono aumentati del 3%
P0optionalsubject
P1oblig.object
P2optionaladverbial
NP NP di_PP
RELATEDRELATED
P0
subjectNP
optional
decausativization
locative alternation
reciprocal altern.
symmetrical altern.
MAINMAIN
complexsynt.entry
syntactic frame:
Syntactic entry information contentSyntactic entry information content
september 2004
Specific properties of the entry in the syntactic context described
Subcategorization frame
Link between syntactic structures
Nilda Ruimy
The semantic lexiconThe semantic lexicon
september 2004
Theoretical linguistic background:
Extended version of
Pustejovsky’s Generative Lexicon (GL) theory
Nilda Ruimy
lexical meanings of various levels of complexity
Generative Lexicon theoryGenerative Lexicon theory
september 2004
bambino HUMAN, age (childhood), sex (male) dottore HUMAN, age (adult), sex (male), giornale 1. printed paper, 2. location
3. istitution 4. human group polysemy
simplest ones : definable by a taxonomic relation
more complex ones:hypernymic relation not sufficient
Qualia Structure allows :to coherently model the pluridimensionality of meaning
to represent uniformly semantic units of different degree of complexity
function
to capture the relationships holding btw. semantic units
Nilda Ruimy
Qualiaformal = what is X?constitutive = what is X made of?agentive = how does X come about?telic = what is X’s function?
september 2004
The Original Qualia structureThe Original Qualia structure
Consists of four roles: formal role: distinguishes the denoted entity from others
constitutive role: expresses its components
agentive role: expresses its coming about
telic role: specifies its funtion
Nilda Ruimy
The SIMPLE ontology (1)The SIMPLE ontology (1)
september 2004
Lexicon structured on the basis of a type ontology:
Possible creation of language / application specific types
Core Ontology: top level, general types; large consensus;provide essential information;mappable on EuroWordNet ontology
Recommended Ontology:hierarchically lower and more specific types;provide finer-grained information
Nilda Ruimy
157 language independent semantic types
The SIMPLE ontology (2)The SIMPLE ontology (2)
september 2004
Living_entity
Animal
Earth_Animal
Concrete_entity
Entity
simple types (one-dimensional) : can be fully characterized in terms of a hypernymic
relation, e.g.
Nilda Ruimy
the reference to orthogonal dimensions of meaning
The SIMPLE ontology (3)The SIMPLE ontology (3)
september 2004
Agentive Telic
Institution
Abstract_Entity
Entity
unified types (multi-dimensional) :can only be defined through the combination of: the relation to their supertype
Nilda Ruimy
The SIMPLE ontology (4)The SIMPLE ontology (4)
september 2004
Simple Ontology:
multidimensional type hierarchy based on both
hierarchical and non-hierarchical conceptual relations
Nilda Ruimy
Semantic typesSemantic types
september 2004
In the SIMPLE ontology, types are not mere labels but the repository of a specific set of structured semantic information
Nilda Ruimy
Concrete_entity
Abstract_entityPropertyRepresentation
TELIC
•Furniture
•Instrument
•Clothing
•Artwork
•Sign
•Language
•Information
•.....
•Living_entity
•Human
•Animal
•Vegetal_entity
•Artifact
•Susbstance
•Location
•Food
•Material
•Quality
•Quantity
•Physical_prop
•Psychol_prop
•.....
•Convention
•Cognitive_fact
•.....
Artifactual_material
Artifact
TOP
AGENTIVE CONSTITUTIVE ENTITYEvent...
...
...
some semantic types for abstract & concrete entitiessome semantic types for
abstract & concrete entities
september 2004 Nilda Ruimy
Phenomenon
Change
Psych_eventAspectual
State Act
EVENT
Cause_change
Relational_state
Non_relational_act
Relational_act
Move
Cause_act
Relational_change
Change_possession
Change_location
Acquire_knowledge
Natural_transition
...
Creation
......
......
...
...
Speech_act
...
...
some semantic types for events
some semantic types for events
september 2004 Nilda Ruimy
some semantic types for adjectives
some semantic types for adjectives
september 2004 Nilda Ruimy
ExtensionalIntensional
TOP
Psychological_prop
Social_prop
Physical_prop Intensifying_prop
Temporal_prop
Relational_prop
Temporal
Modal
EmotiveManner
Object_related
Emphasizer
Features:
PlusHuman, PlusCollective,..
Relations between semantic units:
R (<SemU1>, <SemU2>)
Descriptive elementsDescriptive elements
september 2004 Nilda Ruimy
isaantonym_compantonym_gradmult_opposition
FormalFormal
result_ofagentive_progagentive_causeagentive_experiencecaused_bysource
AGENTIVE
ARTIFACTUAL
AGENTIVE
created_byderived_from
AgentiveAgentive
used_forused_asused_byused_against
TELIC
ACTIVITY
INSTRUMENTAL
DIRECT
TELIC
indirect_telicpurpose
object_of_activity
is_the_activity_ofis_the_ability_ofis_the_habit_of
TelicTelicmade_ofis_a_follower_ofhas_as_memberis_a_member_ofhas_as_partinstrumentkinshipis_a_part_ofresulting_staterelatesuses
CONSTITUTIVE
causesconcernsaffectsconstitutive_activitycontains has_as_colourhas_as_effecthas_as_propertymeasured_bymeasuresproducesproduced_by property_ofquantifiesrelated_tosuccessor_ofprecedestypical_ofcontainsfeeling
P
R
O
P
E
R
T
Y
is_inlives_intypical_location
LOCATION
ConstitutiveConstitutive
september 2004
Extended
Extended
Nilda Ruimy
Extended
roles
Extended
roles
Qualia
Qualia
Structure
Structure
isaantonym_compantonym_gradmult_opposition
FormalFormal
result_ofagentive_progagentive_causeagentive_experiencecaused_bysource
AGENTIVE
ARTIFACTUAL
AGENTIVE
created_byderived_from
AgentiveAgentive
used_forused_asused_byused_against
TELIC
ACTIVITY
INSTRUMENTAL
DIRECT
TELIC
indirect_telicpurpose
object_of_activity
is_the_activity_ofis_the_ability_ofis_the_habit_of
TelicTelicmade_ofis_a_follower_ofhas_as_memberis_a_member_ofhas_as_partinstrumentkinshipis_a_part_ofresulting_staterelatesuses
CONSTITUTIVE
causesconcernsaffectsconstitutive_activitycontains has_as_colourhas_as_effecthas_as_propertymeasured_bymeasuresproducesproduced_by property_ofquantifiesrelated_tosuccessor_ofprecedestypical_ofcontainsfeeling
P
R
O
P
E
R
T
Y
is_inlives_intypical_location
LOCATION
ConstitutiveConstitutive
september 2004
proiettile, colpire
(projectile, hit)
antitarmico, tarma
(moth balls, moth)
bisturi, chirurgo
(lancet, surgeon)
metano, combustibile
(methane, fuel)
casa, costruire
(house, build)
mohair, capra
(mohair, goat)
manubrio, bicicletta
(handlebar, bicycle)
abbaiare, cane
(bark, dog)
arancio, arancia
(orange tree, orange)medico, curare
(doctor, cure)
fumatore, fumare
(smoker, smoke)
disgusto, provare
(disgust, feel)
senato, senatore
(senate, senator)
Nilda Ruimy
pane, farina
(bread, flour)
Formal role
Agentive role
Tel
ic r
ole
Con
stit
utiv
e ro
le
instrument
is_a
used_forcr
eate
d_byis_made_of
Orthogonal dimensions of meaningOrthogonal dimensions of meaning
september 2004 Nilda Ruimy
Formal role
Agentive role
Tel
ic r
ole
Con
stit
utiv
e ro
le
violin
is_a
mus
ical
_ins
trum
ent
used_forplaying
crea
ted_
bym
ake
has_as_partstrings
is_made_ofwood
Orthogonal dimensions of meaningOrthogonal dimensions of meaning
september 2004 Nilda Ruimy
recipienterecipientedi legnodi legnofattofatto
che serve per la conservazione e il trasportoche serve per la conservazione e il trasporto
Formal: isa Constitutive: made_of
Agentive: created_by
Constitutive:contains
Telic:Used_for
di doghe arcuate tenute unite da cerchi di ferrodi doghe arcuate tenute unite da cerchi di ferro
Constitutive: made_of
di liquidi, specialmente vinodi liquidi, specialmente vino
bottebottebottebottebarrel
traditional dictionary definition
meaning dimensions expressed by Qualia relations
meaning dimensions expressed by Qualia relations
september 2004 Nilda Ruimy
arnese attrezzo utensile strumento macchina apparecchio dispositivo
giogo
spalliera
piano graticola aratro citofono laser
manufatto
AARRTTIIFFAACCTT
CCOONNCCRREETTEE__EENNTTIITTYY
IINNSSTTRRUUMMEENNTT iiss--aa rreellaattiioonn iiss--aa rreellaattiioonn
iiss--aa rreellaattiioonn iiss--aa rreellaattiioonn
Within a semantic type population, further clusterings can be made through the is-a relation:
september 2004
Qualia informative power (1)Qualia informative power (1)
Nilda Ruimy
INSTRUMENTutensile
graticola colabrodo
frusta
posata
coltello
is-a is-a
is-a
cucinare
used for
used for
mangiare
used for CONTAINER
contenitore
pentola tegame padella
is-a
forchetta
Qualia informative power (2)Qualia informative power (2)
september 2004 Nilda Ruimy
domain
semant. class ontological type Corresp. SynU-SemU
event type
semant. features
semant. relations
Extended Qualia Structure
regular polysemysem. restr.
argumentspredicate predicative represent.
type of link
SemanticUnit
synonymy
derivation
constitutive role
formal role
telic role
agentive role
a. head properties
b. subcat. frame
positionsynt. restr.
syntactic structure 1
Corresp. MrphU-SynU
Corresp. PhnU-MrphU
MorphologicalUnit
PoS & subcat.inflectional paradigm
PhonologicalUnit
stress positionvowel opennesscons. prononciation
syntactic structure 2
positionsynt. restr. Frameseta. head properties
b. subcat. frame SyntacticUnit
semantic level: information contentsemantic level: information content
september 2004 Nilda Ruimy
september 2004
Predicative RepresentationPredicative Representation
Assigned to predicative semantic units assignment of a lexical predicate type of link holding btw. entry and predicate predicate argument stucture
semantic role of arguments
selection restrictions of arguments
link semantic arguments / syntactic complements
Describes the semantic scenario a word sense is involved in
Nilda Ruimy
september 2004
Assignment of a lexical predicateAssignment of a lexical predicate
verbs;predicative nouns: deverbals (costruzione) and collective simple nouns (gruppo), nouns denoting a relation (madre), quantity (bottiglia), part (fetta), unit of measurement (metro), property (bellezza);adjectives;some adverbs (indipendentemente da)
Nilda Ruimy
PRED_ACCUSARE
accusare
accusatore
accusa
master
agent nominalisation
process nominalisation
accusato
patient nominalisation
september 2004
Predicate-semantic unit linkPredicate-semantic unit link
to accuseaccusation
accusatoraccused
Nilda Ruimy
ProtoAgent: volitional subject of verb: ARG0 of kill
ProtoPatient: object undergoing an action: ARG1 of kill
2ndParticipant: indirect object: ARG2 of give
SoA (State of Affair): sentential complement: ARG2 of ask
Location: ARG2 of put
Direction: ARG2 of move
Origin: ARG1 of move
Kinship: ARG0 of father
HeadQuantified: ARG0 of metre, bottle
september 2004
Semantic arguments: thematic roles
Semantic arguments: thematic roles
Nilda Ruimy
Features, used transversely across semantic types (eg.: plusEdible), allow to capture wider preferences w.r.t. single semantic types:
ARG1 eat : [PlusEdible] / ARG1 eat : [FOOD]
september 2004
Semantic arguments: selectional restrictionsSemantic arguments: selectional restrictions
Not proper restrictions, but rather preferences of preferences of combinations in prototypical situationscombinations in prototypical situations.
Expressible through:semantic types;notions (combination of types or type + feature…)features;semantic units
Nilda Ruimy
increase: the increase of prices by the government
september 2004 Nilda Ruimy
PREDICATIVE REPRESENTATIONPREDICATIVE REPRESENTATION
EXTENDED QUALIA INFO.EXTENDED QUALIA INFO.
ONTOLOGICAL INFO.ONTOLOGICAL INFO.
Aumento:
• Semantic type: Cause_change_of_value
• Gloss: accrescimento in dimensione o quantità
• Agentivecause: yes
L’aumento dei prezzi da parte del governo
• Supertype: Cause_relational_change
• Eventype: transition• Domain: general, economics
• aumento isa cambiamento
• aumento resulting_state maggiore
• Direction: up
• Morphological derivation: Eventverb aumentare
• Lexical semantic predicate: PRED_aumentare
• Type of link: event nominalization
• Predicate arg. struct.: range, semantic role & selectional restrictions of args.:
Arg0
Protoagent
Human / Institution
Arg1
ProtoPatient
Entity
Arg2
Quantifier
Amount
Semantic entry information content (1)Semantic entry information content (1)
spray: to spray water with a spray
september 2004 Nilda Ruimy
PREDICATIVE REPRESENTATIONPREDICATIVE REPRESENTATION
EXTENDED QUALIA INFO.EXTENDED QUALIA INFO.
ONTOLOGICAL INFO.ONTOLOGICAL INFO.
vaporizzatore:
• Semantic type: Instrument
• Gloss: apparecchio usato per ridurre in minuscole particelle un liquido
• vaporizzatore created_by fabbricare
spruzzare acqua con un vaporizzatore
• Supertype: Artifact
• Eventype: ===• Domain: general, cleaning, gardening, cosmetics
• vaporizzatore isa apparecchio• vaporizzatore has_as_part pulsante
• vaporizzatore used_for atomizzare
• Morphological derivation: Eventverb vaporizzare
• Lexical semantic predicate: PRED_vaporizzare
• Type of link: instrument nominalization
• Predicate arg. struct.: range, semantic role & selectional restrictions of args.:
Arg0
Protoagent
Human / Instrument
Arg1
ProtoPatient
+liquid
Arg2
Location
Concrete_entity
Semantic entry information content (2)Semantic entry information content (2)
• Synonymy: nebulizzatore
domain
semant. class
a. head properties
b. subcat. frame
positionsynt. restr.
syntactic structure 1
ontological type Corresp. SynU-SemU
event type
semant. features
semant. relations
Extended Qualia Structure
regular polysemysem. restr.
argumentspredicate predicative represent.
Corresp. Syntax-Semantics
type of link
SemanticUnit
synonymy
derivation
constitutive role
formal role
telic role
agentive role
syntactic structure 2
positionsynt. restr. Frameseta. head properties
b. subcat. frame SyntacticUnit
Syntax-semantics mapping (1)Syntax-semantics mapping (1)
september 2004 Nilda Ruimy
Nilda Ruimy
SynU_migliorare
Transitive structure
P0 P1
Intransitive structure
P0Frameset
SYNTACTIC LEVEL
SEMANTIC LEVEL
SemU2_migliorare
CHANGE_OF_STATE
SemU1_migliorare
CAUSE_CHANGE_OF_STATE
‘to improve’
PRED_ migliorare
ARG0 : Agent ARG1 : Patient
SEMANTIC PREDICATE
LINK PREDICATE-SEMANTIC UNIT
september 2004
Syntax-semantics mapping (2)Syntax-semantics mapping (2)
september 2004 Nilda Ruimy
SynU_migliorare ‘to improve’
Transitive structure
P0 P1
Intransitive structure
P0Frameset
SemU1_migliorare SemU2_migliorare
CHANGE_OF_STATECAUSE_CHANGE_OF_STATE
PRED_ migliorare
ARG0 : Agent ARG1 : Patient
CORRESPONDENCE SYNTACTIC-SEMANTIC FRAME
isomorphic isomorphic non-isomorphic non-isomorphic
Syntax-semantics mapping (2)Syntax-semantics mapping (2)
a template is a schema providing, for each semantic type, a set of structured information that are deemed crucial to its definition
twofold function:interface between ontology and lexiconguide for the lexicographer
ensures systematicity, consistency and uniformity of representation of the lexical meaning
september 2004
Template-drivenencoding methodology
Template-drivenencoding methodology
Nilda Ruimy
SemU: SemU identifier SynU: Identifier of the SynU the SemU is related to BC number: Number of the corresponding ItalWordNet base concept Template_Type: [Container] Unification_path: [Concrete_entity | ArtifactAgentive | Telic] Domain: General Semantic Class Link to the LexiQuest (or any other ontology) Gloss: Lexicographic gloss Predicative_Repr.:
Predicate associated to the SemU and its argument structure [container(arg0)]
Selectional Restr.:
Selectional restrictions (Arg0-HeadQuantifier-Substance)
Derivation: Derivational relations between SemUs Formal: isa (1, <container> or <hyperonym>) Agentive: created_by (1, <Usem>: [CREATION]) //definitorial// Constitutive: made_of (1, <Usem>) //optional//
has_as_part (1, <Usem>) //optional// contains (1, <Usem>)
Telic: used_for (1, <contain>) //definitorial// used_for (1, <measure>) //optional//
Synonymy: Synonyms of the SemU //optional// Regular Polysemy:
[Amount] [Container]
A templateA template
september 2004 Nilda Ruimy
Generic lexicon large coverage (vocabulary and synt. structures)
Based on a rich and multifunctional linguistic and representational
model shared by 11 other European lexica
Fine-grained information, highly structured, innovative, most useful
for HLT applications
The largest electronic, multilevel lexical resource of Italian language
Lexical description conformant to international standards
Respect of the principles of uniformity, consistency and exhaustivity
High level of reusability
4 description levels: phonology, morphology, syntax, semantics
55,000 words encoded
september 2004
CLIPS’ key featuresCLIPS’ key features
Nilda Ruimy
natural language understanding, etc.
surface and deep analysis of texts
information retrieval
machine translation
Application fieldsApplication fields
september 2004
building semantic networks
extracting the vocabulary of a specific domain
The wealth of information the lexicon contains allows:
NP recognition: disambiguating the semantic contribution
of some PPs in complex nominals
Nilda Ruimy
as the PAROLE and SIMPLE lexicons, CLIPS does meet these requirements
september 2004
To lend itself to further uses, a lexicon must have: flexible model generic database uniformly structured data precise and explicit linguistic description
Nilda Ruimy
september 2004
1) Use CLIPS and the PAROLE-SIMPLE French lexicon
2) Perform a semi-automatic linking of their respective
entries
Strategy I:
Creating a bilingual electronic lexical resource
Creating a bilingual electronic lexical resource
Nilda Ruimy
september 2004
1) Derive , in a semi-automatic way, a semantically
annotated French lexicon from CLIPS
2) Use source and derived lexicons as a basis for
building a bilingual resource
Strategy II:
Creating a bilingual electronic lexical resource
Creating a bilingual electronic lexical resource
Nilda Ruimy
Strategy I:
CLIPSCLIPS
bilingual dictionary
IT-FR & FR-IT
capoufficiogentile
residenzatesserepompascriveretessuto
vestibolotesto
amministratorevincere
PAR-SIMPLE
French lex.PAR-SIMPLE
French lex.
capo_1 phon:…….morph:.……syn:……….sem:…….
capo_2….
ufficio_1 ………………………….
tête_1 morph:.……syn:……….sem:…….
tête_2
…..
tête_3
…
bureau_1 ………………………….
?
?
capo xxxxx têteyyyyy chefzzzzz bout
ufficioxxxxx bureauyyyyy charge…….. ……..
tête xxxxx testayyyyy capozzzzz facciawww cima bureauxxxxx ufficioyyyyy scrivania……..
ALGORITHM
september 2004 Nilda Ruimy
Analysis of the inherent properties of the SL & TL senses:• identity of ontological classification or subsumption relation btw. the semantic type of the SL & TL senses• identity of semantic class or subsumption relation btw. their semantic class• identity of domain or subsumption relation btw. their domain info.• identity / corrispondence of semantic features• identity / corrispondence of semantic relations
Analysis of their contextual properties:• compatibility of syntactic valency• function and grammatical instantiation of complements• compatibility of semantic valency• semantic role and semantic restrictions of arguments
cf. Villegas et al. LREC 2000, Athens
september 2004 Nilda Ruimy
evento évènementfreedefinition=”cio' che e' accaduto o potra' accadere, avvenimento”Tipo semantico: EVENTSupertype: ENTITYClasse semantica: EVENT
freedefinition="something that happens at a given place and time"Tipo semantico: EVENTSupertype: -----Classe semantica: EVENT
scrivere écrire
freedefinition=”creare qualcosa di scritto”Tipo semantico: SYMBOLIC_CREATIONSupertype: CREATIONClasse semantica: CREATIONDomain: CREATIVE_WRITING
freedefinition=”create written works & semi” Tipo semantico: CREATIONSupertype: -----Classe semantica: CREATIONDomain: ----
pompa pompefreedefinition=”macchina o apparecchio usato per sollevare liquidi o comprimere gas”Tipo semantico: INSTRUMENTUnificationPath:ConcreteEntityArtifactagentive -MaterialtelicClasse semantica: APPARATUS
freedefinition= "a device that moves fluid or gas by pressure or suction"Tipo semantico: -----UnificationPath:-----
Classe semantica: APPARATUSseptember 2004 Nilda Ruimy
vincere vaincre
freedefinition=”portare a termine con successo” Tipo semantico: RELATIONAL_ACTClasse semantica: ACTIVITYRel.Sem:----
freedef.=”be the winner in contest/competition” Tipo semantico: CAUSE_RELAT.-CHANGEClasse semantica: CHANGERel.Sem: Resulting_action/state: victoire Agentive_cause:cause
Tipo semantico: RELATIONAL_ACTSupertype: -----Classe semantica: OBJECTDomain: ----Tratto distintivo: PLUS_SEMIOTIC
Tipo semantico: INFORMATIONSupertype: REPRESENTATIONClasse semantica: ABSTRACTDomain: MEDIATratto distintivo: PLUS_SEMIOTIC
textetesto_1
Tipo semantico: SEMIOTIC_ARTIFACTUnficationPath:ConcreteEntity-Artifactagentive -TelicClasse semantica: ARTIFACTDomain: MEDIATratto distintivo: PLUS_SEMIOTIC
testo_2
PREDICATE_vincere_1 PREDICATE_vaincre_2september 2004 Nilda Ruimy
Discrepancy of lexical coverage between the lexicons => method applicable to 10,000 senses only
Drawbacks of this strategyDrawbacks of this strategy
september 2004
SIMPLE-FR does not always encode all information => necessity of manual intervention wherever SL and TL entries have NO corresponding element due to:
encoding error having privileged different although complementary aspects of meaning, e.g.: imprigionare: PURPOSE_ACT
vs. emprisonner: CAUSE_RELATIONAL_CHANGE
lack of information
Nilda Ruimy
september 2004
Deriving a FR lexicon from CLIPSDeriving a FR lexicon from CLIPS
Feasibility study for deriving a semantically annotated French lexicon using CLIPS lexical knowledge
Crucial step for deriving the French entries:
correctly pair off each FR w. sense with the relevant CLIPS semantic unit whose information we want to ultimately assign to the French entry
Strategy II – Phase 1:Strategy II – Phase 1:
Nilda Ruimy
villaggio: 1 . (piccolo centro abitato) village2. (complesso urbanistico) village
CLIPSCLIPS
semantically annotated
French lexicon
semantically annotated
French lexicon
capo:1.(testa) tête;2.(persona che...) chef...
sense indicatorapproach
sense indicatorapproach
cognate approachcognate
approach
september 2004
exploits the cognateness of Italian and French endings to relate the FR word to the IT CLIPS entry and infer
the FR entry
matches onto the CLIPS data the information provided in bilingual
dictionaries by sense indicators, in order to identify the relevant CLIPS
entry Nilda Ruimy
look-up
september 2004 Nilda Ruimy
<SemU id="USem0001village">naming="village"weightvalsemfeaturel= «Geopolitical_Location»[…] </SemU>
<SemU id="USem0002village"> naming="village"weightvalsemfeaturel=«Human_group»[…] </SemU>
FR–LEX
<SemU id="USem4123villaggio"> naming="villaggio"weightvalsemfeatrel=«Geopolitical_Location»[…] </SemU>
<SemU id="USemD63504villaggio"naming="villaggio"weightvalsemfeaturel=«Human_group» […]</SemU>
IT–CLIPS
IT—FR bilingual dict.
villaggio : 1. (piccolo centro abitato) village 2. (complesso urbanistico) village
The cognate approachThe cognate approach P. Bouillon, B. Cartoni, TIM/ISSCO, ETI, Geneva
derivation
Condition: unique French constructed word
translate all IT senses
IT word SENSE INDICATOR FR wordIT word SENSE INDICATOR FR word
compagnie(presenza)compagnia
compagnie(gruppo)compagnia
asphalte(per rivestire) asfalto
sentir (percepire)avvertire
prévenir(avvisare)avvertire
aspirer àintr.(avere) prep. aaspirare
aspirertr. (inalare)aspirare
aspirerLING.aspirare
aspirertr.(con un tubo)aspirare
tête(testa)capo
chef(persona che…)capo
extracted from bilingual dictionary
…
analysis & classificationof sense indicators
Nilda Ruimy september 2004
The sense indicator approachThe sense indicator approach N. Ruimy, ILC-CNR, Pisa
indicators conveying morphosyntactic information: verb subclass, auxiliary selection, plural form of nouns,
typical subject / object, PP type, etc.
september 2004
Types of sense indicators (1)Types of sense indicators (1)
Nilda Ruimy
Italian–French
COVARE
A. v.tr.
1 (di uccelli) [dar calore col proprio corpo alle uova per sviluppare l’embrione] couver
2 (fig.) [custodire con gelosia] couver
3 (fig.)[nutrire, alimentare in segreto dentro di sé] nourrir, mijoter
[tramare, macchinare in segreto] couver [incubare] couver: covare un malanno
B. v.intr. (aus. avere)(fig.)[stare chiuso, nascosto] couver: il fuoco cova sotto la cenere auxiliary
typical subj.verbal class
verbal class
Atkins, Bouillon, 2003
september 2004
indicators conveying inferential information: synonyms, hypernyms, meronyms domain of use
Types of sense indicators (2)Types of sense indicators (2)
Nilda Ruimy
Italian–French
CAPOI (persone)1 [testa] tête 2 (fig.) [mente, intelligenza] tête 3 [persona investita di comando, di potere] chef
II (animali)1 (raro) -> testa2 spec. al plur [ciascun individuo di una specie determinata]
têtes, pièces
III (cose) 1 [la parte più grossa e più sporgente di un oggetto] tête 2 [la parte più alta] haut3 [ciascuna delle due estremità di qlco.] bout, tête4 [inizio, principio] début5 [fine, conclusione; sbocco] bout6 loc. …..7 (nei filati) fil8 [singolo oggetto appartenente ad una serie] pièce9 (geog.) cap
synonym
hypernym
synonym
domain of use
domain of use
synonym
IT word SENSE INDICATOR FR wordIT word SENSE INDICATOR FR word
CLIPSCLIPSCLIPSCLIPS
bijouterie(arte)gioielleria
bijouterie(negozio)gioielleria
asphalte(per rivestire) asfalto
sentir (percepire)avvertire
prévenir(avvisare)avvertire
aspirer àintr.(avere) prep. aaspirare
aspirertr. (inalare)aspirare
aspirerLING.aspirare
aspirertr.(con un tubo)aspirare
tête(testa)capo
chef(persona che…)capo
…
sense indicators used as search keys for identifying, in CLIPS, the semantic entry relevant to the IT sense
of the bilingual pair
Nilda Ruimy september 2004
september 2004
Using sense indicatorsUsing sense indicators
indicators usable straightforwardly
indicators to be converted into the descriptive
language of CLIPS:
illuminare (rendere luminoso) illuminer (to make luminous)
analizzatore (chi effettua analisi) analyseur (who performs analyses)
sem. type of analizzatore belongs to HUMAN hierarchy
sem. type of iluminare belongs to causative types hierarchy
Nilda Ruimy
september 2004
Rule typesRule types search for a CLIPS entry containing the s.i. as target
of the synonymic relation
of the hypernymic relation
of any qualia relation
search for a CLIPS entry sharing properties with the entry of the s.i.
shared hypernym
shared semantic type
search for a CLIPS entry containing information inferred from the s.i.
specific type
specific relation or feature (esp. domain info.)
specific syntactic structure
testacapo synonym_rel
negoziogioielleria isa_rel
comunicare (notificare) isa_rel dire
avvertire (percepire) semtype EXP._EVENT
conoscere (pron. (reciprocamente)) reciprocal syn. struct.
Nilda Ruimy
IT word SENSE INDICATOR FR word
CLIPSCLIPSCLIPSCLIPS
compagnie(presenza)compagnia
compagnie(gruppo)compagnia
asphalte(per rivestire) asfalto
sentir (percepire)avvertire
prévenir(avvisare)avvertire
aspirer àintr.(avere) prep. aaspirare
aspirertr. (inalare)aspirare
aspirerLING.aspirare
aspirertr.(con un tubo)aspirare
tête(testa)capo
chef(persona che…)capo
SemU61397capo, sem. type=Body_part, where <capo> synonym <testa>
SemU3615capo, sem. type=Role, where <capo> isa <persona>
SemU68603asfalto, sem. type=Artifact_Material, where <asfalto> used_for <rivestire>
SemU79372aspirare, sem. type=Speech_act, where domain:phonetics
SemU7040aspirare, sem. type=Modal_event, linked to SynUaspirare, intr. pp_a
…
september 2004 Nilda Ruimy
Small percentage of errors due to a
different granularity of
sense distinctions in CLIPS and in the blingual dictionary
IT constructed words whose different senses are translated by a unique FR
constructed word
IT constructed words having more than one
translation
–aggio 89.9 % 10.1 %
–tà 77.4 % 22.6 %
–zione 80.4 % 19.6 %
FR constructed words sharing the IT CLIPS entries
–aggio 99.97 %
–tà 99.98 %
–zione 99.98 %
recall ratio
september 2004
Cognate approach: resultsCognate approach: results
Nilda Ruimy
Itword – sense indicator – FRword
X – A – Y
application order
1 2 9 7 8 6 3 5 4
investigated lex. data
target of syn. rel.
target of hyper. rel.
target of any qualia
sharedhypernym
sharedsemtype
specificsemtype
specificdomain
specificfeat/rel
specificsyn.struct
success
rate16.6%
26.8% 0.92% 8.9% 5.8% 3.9% 12.3% 9.2% 15.4%
rule type
1
search for an entry of X containing string A
2
search for entry of X sharing properties with an entry of A
3
search for an entry of X containing information inferred from A
september 2004
Sense indicator approach: resultsSense indicator approach: results
the higher the rule rank, the more reliable the result
Nilda Ruimy
distribution of success rate over the algorithm rules
distribution of success rate over the algorithm rules
recall ratio: 69%recall ratio: 69%
september 2004 Nilda Ruimy
results may be enhanced by gleaning the most
informative sense indicators from different sources
september 2004
Combining the two methodsCombining the two methods
constructed words represent
68.2% of the vocabulary
successful handling of:
+
69% of non constructed words
95% of constructed words
Nilda Ruimy
Approaches taken applicable to other language pairs sharing
similarities in terms of morphological structure
Derived lexicon building process is simplified and shortened
Deriving new lexical resources from existing ones: a worthwhile
venture in terms of time and effort
Such practice entails coverage and consistency assessment of
the source lexical resource
Source and derived lexicons constitute a most reliable basis for
developing a bilingual resource
september 2004
Concluding remarksConcluding remarks
Nilda Ruimy