15
COLING Workshop - 2002 Nicoletta Calzolari Nicoletta Calzolari ILC - CNR - Pisa, Italy ILC - CNR - Pisa, Italy Language Resources & Semantic Language Resources & Semantic Web Web

Nicoletta Calzolari ILC - CNR - Pisa, Italy

  • Upload
    elma

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Nicoletta Calzolari ILC - CNR - Pisa, Italy. Language Resources & Semantic Web. To make the Semantic Web a reality . …need to tackle the twofold challenge of content availability and multilinguality Natural convergence with HLT: multilingual semantic processing ontologies - PowerPoint PPT Presentation

Citation preview

Page 1: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

Nicoletta CalzolariNicoletta CalzolariILC - CNR - Pisa, ItalyILC - CNR - Pisa, Italy

Language Resources & Language Resources & Semantic WebSemantic Web

Page 2: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

To make the Semantic Web a reality ...To make the Semantic Web a reality ...…need to tackle the twofold challenge of • content availabilitycontent availability and • multilingualitymultilinguality

Natural convergence with HLT:• multilingual semantic processingmultilingual semantic processing• ontologiesontologies• semantic-syntactic computational semantic-syntactic computational

lexiconslexicons

Page 3: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

Computational Multilingual Lexicons: Computational Multilingual Lexicons: an essential component for the an essential component for the

Semantic WebSemantic Web • Language - & lexicons - are the gateway to knowledgeLanguage - & lexicons - are the gateway to knowledge• Semantic Web developers need repositories of wordsrepositories of words & terms - &

knowledge of their relations in language use & ontological classification. • The cost of adding this structured and machine-understandable machine-understandable

lexical informationlexical information can be one of the factors that delays its full deployment.

• The effort of making available millions of ‘words’ for dozens of millions of ‘words’ for dozens of languageslanguages is something that no small groupno small group is able to afford.

• A radical shift in the lexical paradigm - whereby many participants radical shift in the lexical paradigm - whereby many participants add linguistic content descriptions in an open distributed lexical add linguistic content descriptions in an open distributed lexical framework - is required to make the Web usableframework - is required to make the Web usable

Page 4: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

ButBut … they will never be “complete” … they will never be “complete”

Semantic network: Euro-/ItalWordNetSemantic network: Euro-/ItalWordNetLexicons: PAROLE/SIMPLE/CLIPSLexicons: PAROLE/SIMPLE/CLIPSTreeBankTreeBank +sw+sw

Infrastructure of Language Infrastructure of Language Resources...Resources...

Lexical acquisitionLexical acquisition systems systems (syntactic & semantic) from text corporafrom text corpora

Robust systems of morphosyntactic & syntactic analysismorphosyntactic & syntactic analysisWord-senseWord-sense disambiguation systemsdisambiguation systems

...static...static

……dynamicdynamic

International International StandardsStandards

Page 5: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

Italian Semantic NetworkItalian Semantic NetworkItalian module of EuroWordNet

(http://www.hum.uva.nl/~ewn/)

~ 50.00050.000 lemmas organized in synonym groupssynonym groups (synsetssynsets), structured in hierarchieshierarchies & linked by ~ 130.000130.000 semantic relations

~~50.000 hyperonymy/hyponymy relations~ 16.000 relations among different POS (role, cause, derivation, etc..)~ 2.000 part-whole relations~ 1.500 antonymy relations, …etc.

•Synsets linked to the InterLingual Index linked to the InterLingual Index (ILI=Princeton WordNet),

•Through the ILIILI link to all the European European WordNets WordNets (de-facto standard) & to the common Top OntologyTop Ontology

•Possibility of plug-in with domain terminological lexiconsdomain terminological lexicons

•Usable in IR, CLIR, IE, QA, ...

Page 6: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

Domain - Semantic classDomain - Semantic classmangiaremangiare

Page 7: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

mangiaremangiare

Used_forObject_of_the_activity

man giare

mangiare

tavola

FURNITURE

forchetta

posata

INSTRUMENT

ristorante

BUILDING

cucinare

cuocere

mestolo

pentola

CONTAINER

mangiare

friggere

friggitrice

bollitore

bollire

pesc

e

pesciera

Is_the_activity_of

cuoco

PROFESSION

cucinaremangia

re

man

giar

em

angi

are

man

giar

e

coniglio

carne

mela

carota

arrosto

man

giar

e

ARTIFACT _FOOD

VEGETABLES

FRUITFOOD

SUBSTANCE_FOOD

+edible

zucchero

alloro

tartufo

VEGETAL_ENTITY

FLAVOURING

NATURAL_SUBSTANCE

AGENTIVE

TELIC

Created_by

cucinare

cuocerearrostirebollirelessarestufare

friggere rosolaregrigliare

……

Domain - Semantic classDomain - Semantic class

Page 8: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

machine language learningmachine language learning

Page 9: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

machine language learningdevelopment of conceptual networksdevelopment of conceptual networks

linguistic learninglinguistic learning

adaptive classification systemsadaptive classification systems

information extractioninformation extraction

bootstrappingbootstrapping of grammars of grammars

linguistic change modelslinguistic change models

language usage modelslanguage usage models

bootstrapping bootstrapping of lexical informationof lexical information

Page 10: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

Beyond MILE:Beyond MILE: towards open & distributed lexiconstowards open & distributed lexicons

Semantic LexiconSemantic LexiconURI = http://www.xxx…

Syntactic Syntactic ConstructionsConstructionsURI = http://www.yyy…

OntologyOntologyURI = http://www.zzz…

Monolingual/MultilingualMonolingual/Multilingual LexiconLexicon

Lex_object: semFeatureURI = http://www.xxx…#HUMAN

Lex_object: syntagmaNTURI = http://www.zzz…#NP

Page 11: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

Target…..Target….. Multilingual Knowledge Multilingual Knowledge ManagementManagement Technical Technical Feasibility:Feasibility:

Prerequisite:Prerequisite: is it an achievable goalachievable goal a commonly commonly agreedagreed text/lexicon annotation protocol also for text/lexicon annotation protocol also for the semantic/conceptual levelthe semantic/conceptual level (to be able to automatically establish links among different languages)?

YesYes, at the lexicallexical level

More complex, for corpus annotation?More complex, for corpus annotation?

EAGLES/ISLEEAGLES/ISLE

Page 12: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

A few Issues for discussion:A few Issues for discussion:lexicon standardslexicon standards• Semantic Web standardsSemantic Web standards and the needs of content content

processing technologies: processing technologies: – importance of reaching consensus on (linguistic and non-

linguistic) “content”“content”, in addition to agreement on formats and encoding issues (…words convey content & knowledge)

– short/medium term requirements wrt standards for standards for multilingual lexicons & content encodingmultilingual lexicons & content encoding, also industrial requirementsindustrial requirements

• Relation with Spoken language Relation with Spoken language community• MILE & MILE & Asian languagesAsian languages: : how to cooperate concretely?how to cooperate concretely?• Define further stepsfurther steps necessary to converge on common

prioritiespriorities• ….

Page 13: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

A few Issues for discussion:A few Issues for discussion:“content”, priorities...“content”, priorities...• For which type of resources to invest?For which type of resources to invest? wrt short vs. medium

term results?• Need for robust systems, able to acquire/tune robust systems, able to acquire/tune

lexical/linguistic lexical/linguistic (also multilingual) knowledge knowledge, to auto-enrich static basic resources?

• What the relation betw. lexical standards and text relation betw. lexical standards and text annotation protocols?annotation protocols?

• Knowledge management is critical. For “content” For “content” interoperabilityinteroperability, is the field ‘mature’ enough to converge‘mature’ enough to converge around agreed standards also for the semantic/conceptual level (e.g. to automatically establish links among different languages)?

• Is the field of multilingual lexical resources ready to tackle the ready to tackle the challenges set by the Semantic Webchallenges set by the Semantic Web development?

Towards a new paradigm??Towards a new paradigm??

Page 14: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

A new paradigm for LR?A new paradigm for LR?

Where the focus is on cooperationcooperation

New Strategic Vision?New Strategic Vision?

towards a Distributed Open Lexical Distributed Open Lexical Infrastructure?Infrastructure?

• for distributed & cooperative creationdistributed & cooperative creation, management, etc. of Lexical Resources

• technical & organisational requirementstechnical & organisational requirements

Page 15: Nicoletta Calzolari ILC - CNR - Pisa, Italy

COLING Workshop - 2002

““ELITE” ELITE” (expression of interest for the 6thFP)(expression of interest for the 6thFP)

“European Lexical Infrastructure and Lexical Infrastructure and TechnologyTechnology”

New proposed paradigm for lexicon development:

Open & Distributed Lexical InfrastructureOpen & Distributed Lexical Infrastructure

for content description and content interoperability, to make lexical resources usable within the emerging Semantic WebSemantic Web scenario

Language Resources & Language Resources & Semantic WebSemantic Web