CLEF 2007 - Budapest Joint SemEval/CLEF tasks: Contribution of WSD to CLIR UBC: Agirre, Lopez de Lacalle, Otegi, Rigau, FBK: Magnini Irion Technologies:

CLEF 2007 - Budapest

Joint SemEval/CLEF tasks: Contribution of WSD to CLIR

UBC: Agirre, Lopez de Lacalle, Otegi, Rigau,

FBK: Magnini

Irion Technologies: Vossen

CLEF 2007 - Budapest 2

WSD and SemEval

Word Sense Disambiguation When I went to bed at around two o'clock that night ,

everyone else was still out in the party. party:N:1 political organization party:N:2 social event

Potential for more precise expansion (translation) SemEval 2007

Framework for semantic evaluations Under auspices of SIGLEX (ACL) 19 tasks incl. WSD, SRL, full frames, people, … > 100 attendants in ACL workshop


Motivation for the task

WSD perspective In-vitro evaluations not fully satisfactory In-vivo evaluations in applications (MT, IR, …)

IR perspective Usefulness of WSD on IR/CLIR disputed, but … Real compared to artificial experiments Expansion compared to just WSD Weighted list of senses compared to best sense Controlling which word to disambiguate WSD technology has improved Coarser-grained senses (90% acc. on Semeval 2007)


Motivation for the task

Combining WSD and IR: Many possible variations Unfeasible for a single research team A public common dataset allows for the community to

explore different combinations. Tasks where we could hope to get positive impact:

High recall IR scenarios Short passage IR scenarios Q&A CLIR

We selected CLIR because of previous expertise of some of the organizers.


Two-stage framework

First stage (SemEval 2007 task 01): Participants: submit WSD results

Sense inventory WordNet 1.6 (multilinguality) Organizers:

Expansion / translation strategy fixed IR/CLIR system fixed (IR as upperbound)

Second stage (Proposed CLEF 2008 track): Organizers: provide several WSD annotations Participants: submit CLIR results with/without

WSD annotations


Outline

Description of the SemEval task (1st stage) Evaluation of results (1st stage) Conclusions (1st stage) Next step (2nd stage)


Description of the taskDatasets CLEF data:

Documents in English: LA94, GH95170.000 documents, 580 Mb raw text

300 topics: both in English and Spanish Existing relevance judgments

Due to time limitations of the exercise 16,6% of document collection

(we will have 100% shortly) Subset of relevance judgments, 201 topics


Description of the taskTwo subtasks for participantsEnglish WSD of the following: the document collection the topics

We limit to English at the time being.

Return WN 1.6 senses.


Description of the taskSteps of CLIR/IR systemStep 1: Participants return WSD resultsStep 2: Expansion / Translation Multilingual Central Repository (based on

EuroWN) 5 languages tightly connected To ILI concepts (WN 1.6 synsets) Mappings to other WN versions

Example: car sense 1 Expanded to synonyms: automobile Translated to equivalents: auto, coche


Description of the taskSteps of CLIR/IR systemStep 3: IR/CLIR system Adaptation of TwentyOne (Irion)

Pre-processing: XML Indexing: detected noun phrases only Title and description used for queries Stripped down to vector-space matching


Description of the taskThree evaluation settings IR with WSD of documents (English)

WSD of English documents Expansion of senses in the documents

IR with WSD of topics (English) WSD of English documents Expansion of senses in the documents

IR as upperbound of CLIR CLIR with WSD of documents:

WSD of English documents Translation of English documents Retrieval using Spanish topics

CLIR with WSD of topics (Spanish WSD, NO)


Evaluation and resultsParticipant systems Participants returned sense-tagged documents and

topics Two systems participated:

PUTOP from Princeton, unsupervised UNIBA from Bari, KB using WordNet

In-house system: ORGANIZERS, supervised, kNN classifiers

Other baselines: Noexp: original text Fullexp: expand to all senses WSDrand: return sense at random 1st: return first sense in WordNet Wsd50: 50% best senses (in-house WSD system only)


Evaluation and resultsS2AW and S3AW control

Indication of performance of WSD Not necessarily correlated with IR/CLIR results Supervised system (ORG) fares better

Prec. Recall Cov.Senseval-2 all wordsORG 0.584 0.577 93.61%UNIBA 0.498 0.375 75.39%PUTOP 0.388 0.240 61.92%Senseval-3 all wordsORG 0.591 0.566 95.76%UNIBA 0.484 0.338 69.98%PUTOP 0.334 0.186 55.68%


Evaluation and resultsResults (Mean Average Precision MAP)

IR: noexp best

CLIR: fullexp best ORG close far from IR

Expansion and

IR/CLIR system too simple

IRtops IRdocs CLIRnoexp 0.3599 0.3599 0.1446fullexp 0.1610 0.1410 0.2676UNIBA 0.3030 0.1521 0.1373PUTOP 0.3036 0.1482 0.1734Wsdrand 0.2673 0.1482 0.26171st 0.2862 0.1172 0.2637ORG 0.2886 0.1587 0.2664wsd50 0.2651 0.1479 0.2640

Mean Average Precision


Analysis# words in expansion of docs.

IR: the less the better (but) MAP: noexp > ORG > UNIBA MW: noexp < … < ORG

CLIR: the more the better (but) MAP: fullexp > ORG > ORG (50) MW: fullexp > ORG(50) > ORG

WSD allows for moreinformed expansion

Eng.

Sp.

NO WSD

noexp 9 9fullexp 93 58

UNIBAwsdbest 19 17

wsd50 19 17

PUTOPwsdbest 20 16

wsd50 20 16

Baseline1st 24 20wsdrand 24 19

ORG. wsdbest 26 21

wsd50 36 27Millions of words


Conclusions

Main goals met: First try on evaluating WSD on CLIR Large dataset prepared and preprocessed WSD allows for more informed expansion

On the negative side: Participation low

SemEval overload, 10 interested No improvement over baseline

Expansion and IR/CLIR naive


Next stage: CLEF 2008

WSD results provided: WSD of whole collection Best WSD systems in SemEval 2007

CLEF teams will be able to try more sophisticated IR/CLIR methods

Feasibility of a Q/A exercise Suggestions for cooperation on other tasks welcome

Thank you for your attention!http://ixa2.si.ehu.es/semeval-clir