Upload
elma
View
31
Download
0
Embed Size (px)
DESCRIPTION
Nicoletta Calzolari ILC - CNR - Pisa, Italy. Language Resources & Semantic Web. To make the Semantic Web a reality . …need to tackle the twofold challenge of content availability and multilinguality Natural convergence with HLT: multilingual semantic processing ontologies - PowerPoint PPT Presentation
Citation preview
COLING Workshop - 2002
Nicoletta CalzolariNicoletta CalzolariILC - CNR - Pisa, ItalyILC - CNR - Pisa, Italy
Language Resources & Language Resources & Semantic WebSemantic Web
COLING Workshop - 2002
To make the Semantic Web a reality ...To make the Semantic Web a reality ...…need to tackle the twofold challenge of • content availabilitycontent availability and • multilingualitymultilinguality
Natural convergence with HLT:• multilingual semantic processingmultilingual semantic processing• ontologiesontologies• semantic-syntactic computational semantic-syntactic computational
lexiconslexicons
COLING Workshop - 2002
Computational Multilingual Lexicons: Computational Multilingual Lexicons: an essential component for the an essential component for the
Semantic WebSemantic Web • Language - & lexicons - are the gateway to knowledgeLanguage - & lexicons - are the gateway to knowledge• Semantic Web developers need repositories of wordsrepositories of words & terms - &
knowledge of their relations in language use & ontological classification. • The cost of adding this structured and machine-understandable machine-understandable
lexical informationlexical information can be one of the factors that delays its full deployment.
• The effort of making available millions of ‘words’ for dozens of millions of ‘words’ for dozens of languageslanguages is something that no small groupno small group is able to afford.
• A radical shift in the lexical paradigm - whereby many participants radical shift in the lexical paradigm - whereby many participants add linguistic content descriptions in an open distributed lexical add linguistic content descriptions in an open distributed lexical framework - is required to make the Web usableframework - is required to make the Web usable
COLING Workshop - 2002
ButBut … they will never be “complete” … they will never be “complete”
Semantic network: Euro-/ItalWordNetSemantic network: Euro-/ItalWordNetLexicons: PAROLE/SIMPLE/CLIPSLexicons: PAROLE/SIMPLE/CLIPSTreeBankTreeBank +sw+sw
Infrastructure of Language Infrastructure of Language Resources...Resources...
Lexical acquisitionLexical acquisition systems systems (syntactic & semantic) from text corporafrom text corpora
Robust systems of morphosyntactic & syntactic analysismorphosyntactic & syntactic analysisWord-senseWord-sense disambiguation systemsdisambiguation systems
...static...static
……dynamicdynamic
International International StandardsStandards
COLING Workshop - 2002
Italian Semantic NetworkItalian Semantic NetworkItalian module of EuroWordNet
(http://www.hum.uva.nl/~ewn/)
~ 50.00050.000 lemmas organized in synonym groupssynonym groups (synsetssynsets), structured in hierarchieshierarchies & linked by ~ 130.000130.000 semantic relations
~~50.000 hyperonymy/hyponymy relations~ 16.000 relations among different POS (role, cause, derivation, etc..)~ 2.000 part-whole relations~ 1.500 antonymy relations, …etc.
•Synsets linked to the InterLingual Index linked to the InterLingual Index (ILI=Princeton WordNet),
•Through the ILIILI link to all the European European WordNets WordNets (de-facto standard) & to the common Top OntologyTop Ontology
•Possibility of plug-in with domain terminological lexiconsdomain terminological lexicons
•Usable in IR, CLIR, IE, QA, ...
COLING Workshop - 2002
Domain - Semantic classDomain - Semantic classmangiaremangiare
COLING Workshop - 2002
mangiaremangiare
Used_forObject_of_the_activity
man giare
mangiare
tavola
FURNITURE
forchetta
posata
INSTRUMENT
ristorante
BUILDING
cucinare
cuocere
mestolo
pentola
CONTAINER
mangiare
friggere
friggitrice
bollitore
bollire
pesc
e
pesciera
Is_the_activity_of
cuoco
PROFESSION
cucinaremangia
re
man
giar
em
angi
are
man
giar
e
coniglio
carne
mela
carota
arrosto
man
giar
e
ARTIFACT _FOOD
VEGETABLES
FRUITFOOD
SUBSTANCE_FOOD
+edible
zucchero
alloro
tartufo
VEGETAL_ENTITY
FLAVOURING
NATURAL_SUBSTANCE
AGENTIVE
TELIC
Created_by
cucinare
cuocerearrostirebollirelessarestufare
friggere rosolaregrigliare
……
Domain - Semantic classDomain - Semantic class
COLING Workshop - 2002
machine language learningmachine language learning
COLING Workshop - 2002
machine language learningdevelopment of conceptual networksdevelopment of conceptual networks
linguistic learninglinguistic learning
adaptive classification systemsadaptive classification systems
information extractioninformation extraction
bootstrappingbootstrapping of grammars of grammars
linguistic change modelslinguistic change models
language usage modelslanguage usage models
bootstrapping bootstrapping of lexical informationof lexical information
COLING Workshop - 2002
Beyond MILE:Beyond MILE: towards open & distributed lexiconstowards open & distributed lexicons
Semantic LexiconSemantic LexiconURI = http://www.xxx…
Syntactic Syntactic ConstructionsConstructionsURI = http://www.yyy…
OntologyOntologyURI = http://www.zzz…
Monolingual/MultilingualMonolingual/Multilingual LexiconLexicon
Lex_object: semFeatureURI = http://www.xxx…#HUMAN
Lex_object: syntagmaNTURI = http://www.zzz…#NP
COLING Workshop - 2002
Target…..Target….. Multilingual Knowledge Multilingual Knowledge ManagementManagement Technical Technical Feasibility:Feasibility:
Prerequisite:Prerequisite: is it an achievable goalachievable goal a commonly commonly agreedagreed text/lexicon annotation protocol also for text/lexicon annotation protocol also for the semantic/conceptual levelthe semantic/conceptual level (to be able to automatically establish links among different languages)?
YesYes, at the lexicallexical level
More complex, for corpus annotation?More complex, for corpus annotation?
EAGLES/ISLEEAGLES/ISLE
COLING Workshop - 2002
A few Issues for discussion:A few Issues for discussion:lexicon standardslexicon standards• Semantic Web standardsSemantic Web standards and the needs of content content
processing technologies: processing technologies: – importance of reaching consensus on (linguistic and non-
linguistic) “content”“content”, in addition to agreement on formats and encoding issues (…words convey content & knowledge)
– short/medium term requirements wrt standards for standards for multilingual lexicons & content encodingmultilingual lexicons & content encoding, also industrial requirementsindustrial requirements
• Relation with Spoken language Relation with Spoken language community• MILE & MILE & Asian languagesAsian languages: : how to cooperate concretely?how to cooperate concretely?• Define further stepsfurther steps necessary to converge on common
prioritiespriorities• ….
COLING Workshop - 2002
A few Issues for discussion:A few Issues for discussion:“content”, priorities...“content”, priorities...• For which type of resources to invest?For which type of resources to invest? wrt short vs. medium
term results?• Need for robust systems, able to acquire/tune robust systems, able to acquire/tune
lexical/linguistic lexical/linguistic (also multilingual) knowledge knowledge, to auto-enrich static basic resources?
• What the relation betw. lexical standards and text relation betw. lexical standards and text annotation protocols?annotation protocols?
• Knowledge management is critical. For “content” For “content” interoperabilityinteroperability, is the field ‘mature’ enough to converge‘mature’ enough to converge around agreed standards also for the semantic/conceptual level (e.g. to automatically establish links among different languages)?
• Is the field of multilingual lexical resources ready to tackle the ready to tackle the challenges set by the Semantic Webchallenges set by the Semantic Web development?
Towards a new paradigm??Towards a new paradigm??
COLING Workshop - 2002
A new paradigm for LR?A new paradigm for LR?
Where the focus is on cooperationcooperation
New Strategic Vision?New Strategic Vision?
towards a Distributed Open Lexical Distributed Open Lexical Infrastructure?Infrastructure?
• for distributed & cooperative creationdistributed & cooperative creation, management, etc. of Lexical Resources
• technical & organisational requirementstechnical & organisational requirements
COLING Workshop - 2002
““ELITE” ELITE” (expression of interest for the 6thFP)(expression of interest for the 6thFP)
“European Lexical Infrastructure and Lexical Infrastructure and TechnologyTechnology”
New proposed paradigm for lexicon development:
Open & Distributed Lexical InfrastructureOpen & Distributed Lexical Infrastructure
for content description and content interoperability, to make lexical resources usable within the emerging Semantic WebSemantic Web scenario
Language Resources & Language Resources & Semantic WebSemantic Web