42
Evaluating Cross- language Information Retrieval Systems Carol Peters IEI-CNR

Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

Embed Size (px)

Citation preview

Page 1: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

Evaluating Cross-language Information Retrieval Systems

Carol Peters

IEI-CNR

Page 2: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Outline

Why IR System Evaluation is Important

Evaluation programs

An Example

Page 3: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

What is an IR System Evaluation Campaign?

An activity which tests the performance of different systems on a given task (or set of tasks) under standard conditions

Permits contrastive analysis of approaches/technologies

Page 4: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

How well does system meet information need?

System evaluation:

how good are document rankings?

User-based evaluation:

how satisfied is the user?

Page 5: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Why we need Evaluation

evaluation permits hypotheses to be validated and progress assessed

evaluation helps to identify areas where more R&D is needed

evaluation saves developers time and money

CLIR systems are still in experimental stageEvaluation is particularly important!

Page 6: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLIR System Evaluation is Complex

CLIR systems consist of integration of components and technologies

need to evaluate single components need to evaluate overall system

performance need to distinguish methodological aspects

from linguistic knowledge

Page 7: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Technology vs. Usage Evaluation

Usage Evaluation: shows value of a technology for user determines the technology thresholds that are

indispensable for specific usage provides directions for choice of criteria for

technology evaluation

Influence of language and culture on usability of technology needs to be understood

Page 8: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Organising an Evaluation Activity

select control task(s) provide data to test and tune systems define protocol and metrics to be used

in results assessment

Aim is an objective comparison between systems and approaches

Page 9: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Test Collection

Set of documents - must be representative of task of interest; must be large

Set of “topics” - statement of user needs from which system data structure (query) is extracted

Relevance judgments – judgments vary by assessor but no evidence that differences affect comparative evaluation of systems

Page 10: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Using Pooling to Create Large Test Collections

Assessors create topics.

Systems are evaluated using relevance judgments.

Form pools of unique documents from all submissions which the assessors judge for relevance.

A variety of different systems retrieve the top 1000 documents for each topic.

Ellen Voorhees – CLEF 2001 Workshop

Page 11: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Cross-language Test Collections

Consistency harder to obtain than for monolingualparallel or comparable document collectionsmultiple assessors per topic creation and relevance

assessment (for each language)must take care when comparing different language

evaluations (e.g., cross run to mono baseline)

Pooling harder to coordinateneed to have large, diverse pools for all languages retrieval results are not balanced across languages

Taken from Ellen Voorhees – CLEF 2001 Workshop

Page 12: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Evaluation Measures

Recall: measures ability of system to find all relevant items

recall =

Precision: measures ability of system to find only relevant items

precision =

no. of rel. items retrieved----------------------------------no. of rel. items in collection

no. of rel. items retrieved----------------------------------total no. of items retrieved

Recall-Precision Graph is used to compare systems

Page 13: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Main CLIR Evaluation Programs

TIDES: sponsors TREC (Text REtrieval Conferences) and TDT (Topic Detection and Tracking) - Chinese-English tracks in 2000; TREC focussing on English/French - Arabic in 2001

NTCIR: Nat.Inst. for Informatics, Tokyo. Chinese-English; Japanese-English C-L tracks

AMARYLLIS: focused on French; 98-99 campaign included C-L track; 3rd campaign begins Sept.01

CLEF: Cross Language Evaluation Forum - C-L evaluation for European languages

Page 14: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Cross-Language Evaluation Forum

Funded by DELOS Network of Excellence for Digital libraries and US National Institute for Standards and Technology (200-2001)

Extension of CLIR track at TREC (1997-1999)

Coordination is distributed - national sites for each language in multilingual collection

Page 15: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF Partners (2000-2001)

Eurospider, Zurich, Switzerland (Peter Schäuble, Martin Braschler)

IEEC-UNED, Madrid, Spain (Felisa Verdejo, Julio Gonzalo) IEI-CNR, Pisa, Italy (Carol Peters) IZ Sozialwissenschaften, Bonn, Germany (Michael Kluck) NIST, Gaithersburg MD, USA (Donna Harman, Ellen

Voorhees) University of Hildesheim, Germany (Christa Womser-

Hacker) University of Twente, The Netherlands (Djoerd Hiemstra)

Page 16: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF - Main Goals

Promote research by providing an appropriate infrastructure for: CLIR system evaluation, testing and tuning comparison and discussion of results building of test-suites for system developers

Page 17: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001Task Description

Four main evaluation tracks in CLEF 2001: multilingual information retrieval bilingual IR monolingual (non-English) IR domain-specific IR

plus experimental track for interactive C-L systems

Page 18: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001Data Collection

Multilingual comparable corpus of news agencies and newspaper documents for six languages (DE,EN,FR,IT,NL,SP). Nearly 1 million documents

Common set of 50 topics (from which queries are extracted) created in 9 European languages (DE,EN,FR,IT,NL,SP+FI,RU,SV) and 3 Asian languages (JP,TH,ZH)

Page 19: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001 Creating the Queries

Title: European Industry Description: What factors damage the competitiveness of

European industry on the world's markets? Narrative: Relevant documents discuss factors that

render European industry and manufactured goods less competitive with respect to the rest of the world, e.g. North America or Asia. Relevant documents must report data for Europe as a whole rather than for single European nations.

Queries are extracted from topics: 1 or more fields

Page 20: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001 Creating the Queries

Distributed activity (Bonn, Gaithersburg, Pisa, Hildesheim, Twente, Madrid)

Each group produced 13-15 queries (topics), 1/3 local, 1/3 European, 1/3 international

Topic selection at meeting in Pisa (50 topics) Topics were created in DE, EN,FR,IT,NL,SP and

additionally translated to SV,RU,FI and TH,JP,ZH Cleanup after topic translation

Page 21: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Topics either DE,EN,FR,IT FI,NL,SP,SV,RU,ZH,JP,TH

English German

French Italian

Participant’s Cross-Language Information Retrieval System

documents

CLEF 2001 Multilingual IR

One result list of DE, EN, FR,IT and SP documents ranked in decreasing

order of estimated relevance

Spanish

Page 22: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001 Bilingual IR

Task: query English or Dutch target document collections

Goal: retrieve documents for target language, listing results in ranked list

Easier task for beginners !

Page 23: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001 Monolingual IR

Task: querying document collections in FR|DE|IT|NL|SP

Goal: acquire better understanding of language- dependent retrieval problems

different languages present different retrieval problems

issues involved include word order, morphology, diacritic characters, language variants

Page 24: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001Domain-Specific IR

Task: querying a structured database from a vertical domain (social sciences) in German

German/English/Russian thesaurus and English translations of document titles

Monolingual or cross-language task Goal: understand implications of querying in

domain-specific context

Page 25: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001Interactive C-L

Task: interactive document selection in an “unknown” target language

Goal: evaluation of results presentation rather than system performance

Page 26: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001: Participation

N.America Asia

Europe

34 participants, 15 different countries

Page 27: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Details of ExperimentsTrack # Participants # Runs/Experiments

Multilingual 8 26

Bilingual to EN 19 61

Bilingual to NL 3 3

Monolingual DE 12 25

Monolingual ES 10 22

Monolingual FR 9 18

Monolingual IT 8 14

Monolingual NL 9 19

Domain-specific 1 4

Interactive 3 6

Page 28: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Runs per Topic Language

20

20

38

40

17

33

912 6 2 4

DutchEnglishFrenchGermanItalianSpanishChineseFinnishJapaneseRussianSwedishThai

Page 29: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Topic Fields

63

108

13 3 5

TDNTDTDN

Page 30: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001Participation

CMU Eidetica Eurospider * Greenwich U HKUST Hummingbird IAI * IRIT * ITC-irst * JHU-APL * Kasetsart U KCSL Inc.

Medialab Nara Inst. of Tech. National Taiwan U OCE Tech. BV SICS/Conexor SINAI/U Jaen Thomson Legal * TNO TPD * U Alicante U Amsterdam U Exeter

U Glasgow * U Maryland * (interactive only) U Montreal/RALI * U Neuchâtel U Salamanca * U Sheffield * (interactive only) U Tampere * U Twente (*) UC Berkeley (2 groups) * UNED (interactive only)

(* = also participated in 2000)

Page 31: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001Approaches

All traditional approaches used: commercial MT systems (Systran, Babelfish,

Globalink Power Translator, ) both query and document translation tried

bilingual dictionary look-up (on-line and in-house tools) aligned parallel corpora (web-derived) comparable corpora (similarity thesaurus) conceptual networks (Eurowordnet, ZH-EN wordnet) multilingual thesaurus (domain-specific task)

Page 32: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001Techniques Tested

Text processing for multiple languages: Porter stemmer, Inxight commercial stemmer, on-site tools

simple generic “quick&dirty” stemming language independent stemming

separate stopword lists vs single list morphological analysis n-gram indexing, word segmentation, decompounding

(e.g. Chinese, German) use of NLP methods, e.g. phrase identification,

morphosyntactic analysis

Page 33: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001Techniques Tested

Cross-language strategies included: integration of methods (MT, corpora and MRDs) pivot language to translate from L1 -> L2 (DE ->

FR,SP,IT via EN) N-gram based technique to match untranslatable words prior and post-translation pseudo-relevance feedback

(query expanded by associating frequent cooccurrences) vector-based semantic analysis (query expanded by

associating semantically similar terms)

Page 34: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001Techniques Tested

Different strategies experimented for results merging

This remains still an unsolved problem

Page 35: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2001 Workshop

Results of CLEF 2001 campaign presented at Workshop, 3-4 September 2001, Darmstadt, Germany

50 researchers and system developers from academia and industry participated.

Working Notes containing preliminary reports and statistics on CLEF2001 experiments distributed.

Page 36: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF-2001 vs. CLEF-2000

Most participants were back Less MT More Corpus-Based People really start to try each other’s

ideas/methods: corpus-based approaches (parallel web,

alignments) n-grams combination approaches

Page 37: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

“Effect” of CLEF

Many more European groups Dramatic increase of work in

stemming/decompounding (for languages other than English)

Work on mining the web for parallel texts Work on merging (breakthrough still

missing?) Work on combination approaches

Page 38: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2002

Accompanying Measure under IST Accompanying Measure under IST programme: Contract programme: Contract No. IST-2000-31002No. IST-2000-31002. . October 2001October 2001

CLEF ConsortiumIEI-CNR, Pisa; ELRA/ELDA, Paris; Eurospider, Zurich; UNED, Madrid; NIST, USA; IZ Sozialwissenschaften, Bonn

Associated MembersUniversity of Hildesheim, University of Twente, University of Tampere (?)

Page 39: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2002Task Description

Similar to CLEF 2001: multilingual information retrieval bilingual IR (not to English!) monolingual (non-English) IR domain-specific IR interactive track

Plus feasibility study for spoken document track (within DELOS – results reported at CLEF)

Possible cooordination with Amaryllis

Page 40: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

CLEF 2002Schedule

Call for Participation - November 2001 Document release – 1 February 2002 Topic Release – 1 April 2002 Runs received - 15 June 2002 Results communicated – 1 August 2002 Paper for Working Notes - 1 September 2002 Workshop - 19-20 September

Page 41: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Evaluation - Summing up

system evaluation is not a competition to find the best

evaluation provides opportunity to test, tune, and compare approaches in order to improve system performance

an evaluation campaign creates a community interested in examining the same issues and comparing ideas and experiences

Page 42: Evaluating Cross-language Information Retrieval Systems Carol Peters IEI-CNR

SPINN Seminar, Copenhagen26-27 October 2001

Cross-Language Evaluation Forum

For further information see:

http://www.clef-campaign.org

 

or contact:

Carol Peters - IEI-CNR

E-mail: [email protected]