Océ at CLEF 2003

Océ at CLEF 2003

Roel Brand

Marvin Brünner

Samuel Driessen

Jakob Klok

Pascha Iljin

Outline

• Océ mission

• Participation in 2001, 2002

• Participation in 2003: three models

• Results

• Conclusions

• Remark on evaluation measures

Océ-Technologies B.V.

• active in approximately 80 countries

• 23,000 people worldwide

To enable people to share information by offering products and services for the reproduction, presentation, distribution and management of documents.

Mission:

Research: >2000 employees

Participation in 2001, 2002

2001: Dutch mono-lingual task

2002: All mono-lingual tasksSeveral cross-lingualMulti-lingual

Participation in 2003

Mono-lingual tasks

3 ranking models:• BM25• probabilistic• statistical

Query

Topic

title + description

parsing

stop word removal

Query

BM25, probabilistic

Query

Topic

title + description

parsing

stop word removal

Query

statistical

+ compound splitting, morphological variations

Indexing

parsing

stop words are not removed

Ranking functions

BM25k1 & b parameters:

the best match for 2002 Dutch

probabilistic

urn model

coordination level ranking

statistical

a set of clues

degree of significance

Results

Name of the runNumber of retrievedrelevant documents

Averageprecision

R-precision

Swedish BM25 729 out of 889 0.3584 0.3585Swedish probabil. 633 out of 889 0.2716 0.2743Italian BM25 759 out of 809 0.4361 0.4287Italian probabil. 731 out of 809 0.3805 0.3865French BM25 894 out of 946 0.4601 0.4273French probabil. 865 out of 946 0.4188 0.4044Finnish BM25 417 out of 483 0.3570 0.3230Finnish probabil. 407 out of 483 0.3031 0.2624Spanish BM25 2109 out of 2368 0.4156 0.4094Spanish probabil. 2025 out of 2368 0.3500 0.3696German BM25 1482 out of 1825 0.3858 0.3838German probabil. 1337 out of 1825 0.3017 0.3088Dutch BM25 1438 out of 1577 0.4561 0.4438Dutch probabil. 1336 out of 1577 0.4049 0.3652Dutch statist. 2001 1375 out of 1577 0.4253 0.3940Dutch statist. 2002 1378 out of 1577 0.4336 0.3983

Conclusions

• the BM25 model outperforms the probabilistic one

‘knowledge’ about data collectiontopicsassessments

• mathematical correctness - not the best guideline

• for a better retrieval model:

Remark on evaluation measures

Dutch data from 2001

top T=1000 docs; top N are read; M participants

at most N*M relevance judgements

16774 relevance judgements for 50 queries => about 335 per query

1224 relevant documents for 50 queries => about 25 per query

about 60-70% docs in the top 1000 = unknown ?! = irrelevant ?!

A proposal:

Read all T docs. (T=100? 200?)

Documents

Océ at CLEF 2003