Cross{Language Speech Retrieval and its Evaluationufal.mff.cuni.cz/~pecina/files/ufal-2006-slides.pdfIntroductionProject OverviewSpeech RecognitionSpeech RetrievalRetrieval EvaluationConclusion

Cross–Language Speech Retrieval and its Evaluationin the Malach Project

Pavel [email protected]

Institute of Formal and Applied LinguisticsCharles Univesity, Prague

Seminar formalnı lingvistikyUFAL, 20. 11. 2006

Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion

Cross–Language Speech Retrieval and its Evaluation

2 / 55



Information Retrieval

I searching for information in documents or for documents themselves

I searching a body of information for objects that match a search query

I the science and practice of identification and efficient use of recorded data

2 / 55







Speech Retrieval

I a special case of IR in which the information is in spoken form

2 / 55







Speech Retrieval


Cross–Language

I retrieving information in a language different from the language of theuser’s query

2 / 55







Speech Retrieval


Cross–Language

I retrieving information in a language different from the language of theuser’s query

Evaluation

I deals with effectiveness of IR systems: how well they perform

I measures how well users are able to acquire information

I usually comparative: ranks a better system ahead of a worse system

2 / 55


Roadmap

1. Introduction

2. Project OverviewI HistoryI Tasks and Goals

3. Speech RecognitionI Chalanges and ResultsI Demo

4. Speech RetrievalI Overview

5. EvaluationI Test CollectionI Evaluation MeasuresI CLEF 2006

6. Conclusions

3 / 55


Project Overview

4 / 55


The story begins in 1993

5 / 55


The story begins in 1993 with a movie

5 / 55


The story begins in 1993 with a movie and a vision

Steven Spielberg’s vision of:

1. collecting and preservingsurvivor and witness testimonyof the Holocaust

2. cataloging those testimonies tomake them available

3. disseminating the testimoniesfor educational purposes to fightintolerance

4. enabling others to collecttestimonies of other atrocitiesand historical events or perhapsdo so itself

5 / 55


A brief history of the project

1993 Stephen Spielberg releases Schindler’s List.He is approached by survivors who want him to listen their stories of Holocaust.

1994 Spielberg starts Survivors of the Shoah Visual History Foundationto videotape and preserve testimonies of Holocaust survivors and witnesses.

1999 VHF assembled the world’s largest archive of videotaped oral historieswith interviews from 52,000 survivors, liberators, and rescuers from 57 countries.

2000 10 % interviews manually catalogized by VHF at a cost of $8 million.One single testimony consumes an average of 35 hrs (index, summary, review).

2001 NSF project proposal with the goal to dramatically improve access to largemultilingual spoken word collections. Principal investigators: UMD, JHU, IBM

2001 Grant awarded; project Multilingual Access to Large Spoken Archives launches$7.5 million budget distributed over five years, CU and UWB part of the team.

6 / 55


The archive

I maintained by Visual History Foundation/Institute

I assembled in 1994–1999 by 2,300 interviewers and 1,000 photographers

I contains testimonies of 52,000 survivors from 57 countries in 32 languages

I total of 116,000 hours of VHS tapes, 180 TB of MPEG-1 digitalized video

I average duration of a testimony 2:15 hours, total cost per interview $2,000

I extensive human cataloging completed for over 3,000 interviews (72 mil words)

I manually indexed (time–aligned descriptors from a 30,000-keyword thesaurus)

7 / 55


The archive

I maintained by Visual History Foundation/Institute

I assembled in 1994–1999 by 2,300 interviewers and 1,000 photographers

I contains testimonies of 52,000 survivors from 57 countries in 32 languages

I total of 116,000 hours of VHS tapes, 180 TB of MPEG-1 digitalized video

I average duration of a testimony 2:15 hours, total cost per interview $2,000

I extensive human cataloging completed for over 3,000 interviews (72 mil words)

I manually indexed (time–aligned descriptors from a 30,000-keyword thesaurus)

I 573 interviews recorded in the Czech Republic by 38 interviewers

I 4 500 testimonies provided by people born in the Czech Republic

7 / 55


Full–description cataloging and anotations

Interview–level annotationI pre–interview questionaireI names of people and places mentioned in the course ofI free text summary an interview

Segment–level annotationI topic boundaries (average 3 min/segment)I descriptions: summary, cataloguer’s scratchpadI thesaurus labels: names, topic, locations, time periods

Location–Time Concept People

Berlin 1939 Employment Josef Stein

Berlin 1939 Family life Gretchen Stein

Anna Stein

Dresden 1939 Relocation

Transportation–rail

Dresden 1939 Schooling Gunter Wendt

Maria

8 / 55


“Real–time” cataloging and anotations

Interview–level annotationI pre–interview questionaire

Time–aligned annotationI thesaurus labels: names, concepts, locations, time periods

Location–Time Concept People

Berlin 1939

Employment Josef Stein

yY

Family life Gretchen Stein

yY

Anna Stein

yY

Relocation

Dresden 1939 Transportation–rail

yY

Gunter Wendt

Schooling

Maria

9 / 55


Cataloging interface

10 / 55


Interview languages (top 20)

interview countsy

English 24,872Russian 7,052Hebrew 6,126French 1,875Polish 1,549Spanish 1,352Dutch 1,077Hungarian 1,038German 686Bulgarian 645Slovak 583Czech 573Portuguese 562Yiddish 527Italian 433Serbian 382Croatian 353Ukrainian 320Greek 301Swedish 266

English 49,18%

Russian 13,94%

Polish

Hungarian 2,05%

Slovak 1,15%Czech 1,13%

Other

11 / 55


Interview languages (top 20)

interview countsy

English 24,872Russian 7,052Hebrew 6,126French 1,875Polish 1,549Spanish 1,352Dutch 1,077Hungarian 1,038German 686Bulgarian 645Slovak 583Czech 573Portuguese 562Yiddish 527Italian 433Serbian 382Croatian 353Ukrainian 320Greek 301Swedish 266

English 49,18%

Russian 13,94%

Polish

Hungarian 2,05%

Slovak 1,15%Czech 1,13%

Other

11 / 55


Project tasks and participants

1. automatic recognition of spontaneous speech (multi–lingual)

2. machine supported translation of domain specific thesaurus

3. automatic topic boundary tagging and time–aligned metadata asignment

4. environment for cross–language information retrieval and browsing

IBM T.J. Watson Center, New York- speech recognition in English

Center for Speech and Language Processing, JHU, Baltimore- speech recognition in other languages

- Czech and other Slavic languages subcontracted to CU and UWB

University of Maryland, College Park- archive browsing, information retrieval and its evaluation

- Czech test collection subcontracted to Charles University

12 / 55


Speech Recognition

13 / 55


Project challenges

I complete speaker independent recognition of spontaneous speech

I relatively high technical quality of recordings

I low “language quality”: difficult even for a human listener:

I spontaneous, emotional, disfluent, and whispered speech from eldersI speech with background noise and frequent interruptionsI heavily accented speech that switches between languagesI speech with words such as names, obscure locations, unknown events

14 / 55


Project challenges

I complete speaker independent recognition of spontaneous speech

I relatively high technical quality of recordings

I low “language quality”: difficult even for a human listener:

I spontaneous, emotional, disfluent, and whispered speech from eldersI speech with background noise and frequent interruptionsI heavily accented speech that switches between languagesI speech with words such as names, obscure locations, unknown events

I specific issue in Czech: colloquial expressions and pronunciation

obed [ o b j e d ] Osvetim [ o s v j e t i m ][ o b j e t ] [ v o s v j e t i m ][ v o b j e d ] [ o s v j e n c i m ][ v o b j e t ] [ o z v j e t i m ]

14 / 55


Speech recognition results

Word error rate estimated on a sample of manually trascribed dataas a ratio of misrecognized words.

language WER (%)

English 25.0Czech 35.0Russian 45.7Slovak 34.5

15 / 55




language WER (%)


Manual transcriptions

language TrData (h)English 200Czech 84Russian 100Slovak

15 / 55




language WER (%)


Manual transcriptions

language TrData (h)English 200Czech 84Russian 100Slovak

4/02 10/02 4/03 4/04 10/04 3/0515

20

25

30

35

40

45

50

55

60

65

70

English

Czech

Russian

Slovak

Time

WE

R

15 / 55


Speech recognition demo

name: ∗ ∗ ∗day of birth: Dec 26, 1924

country: Czechoslovakia

religion: judaism

keywords: hiding/death marchesunderground/resistance

Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...

16 / 55





religion: judaism



16 / 55





religion: judaism



16 / 55





religion: judaism



16 / 55





religion: judaism



16 / 55





religion: judaism



16 / 55





religion: judaism



16 / 55





religion: judaism



16 / 55





religion: judaism



16 / 55





religion: judaism



16 / 55





religion: judaism



16 / 55





religion: judaism



16 / 55





religion: judaism


pane pavle zacal jste historku o srncıch a tatınkovi bez hvezdy jak topokracovalo bylo pokracovalo to tım zpusobem ze tatınek si sundal hvezdupan doktor jerab mu napsali skupinku na kladne to bylo bajecny doktor tena fandila nas tatınek se vydal na cestu na krivoklatsko aby upekla ze sem se ...pochopitelne ze strejda prosek s tım nechtel nic mıt za to byly kruty tresty zato se tenkrat popravovalo takze strejda prosek nepytlacil a bal se tady vsudev lesıch byli nemci strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na to byl pes vlcak s kterem drıve prosek nepytlacila ten proste kazdeho sem se nepytlacil ...

16 / 55



name: Hugo Pavel

day of birth: Dec 26, 1924


religion: judaism


pane pavle zacal jste historku o srncıch a tatınkovi bez hvezdy jak topokracovalo bylo pokracovalo to tım zpusobem ze tatınek si sundal hvezdupan doktor jerab mu napsali skupinku na kladne to bylo bajecny doktor tena fandila nas tatınek se vydal na cestu na krivoklatsko aby upekla ze sem se ...pochopitelne ze strejda prosek s tım nechtel nic mıt za to byly kruty tresty zato se tenkrat popravovalo takze strejda prosek nepytlacil a bal se tady vsudev lesıch byli nemci strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na to byl pes vlcak s kterem drıve prosek nepytlacila ten proste kazdeho sem se nepytlacil ...

16 / 55



17 / 55


Bag–of–words representation

I Simple strategy for representing documentsI Count how many times each word occurs (regardless of word order)I Distribution over fixed vocabulary

18 / 55




document d1

Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush

document d2

British PrimeMinister Tony Blairflew to Afghanistanon Monday

18 / 55




document d1


document d2


word d1

a 1Afghanistan 0ahead 1arrival 1Blair 0British 0Bush 1flew 0huge 1Indonesia 1Minister 0Monday 0of 1on 0operation 1President 1Prime 0security 1started 1the 1to 0Tony 0

18 / 55




document d1


document d2


word d1 d2

a 1 0Afghanistan 0 1ahead 1 0arrival 1 0Blair 0 1British 0 1Bush 1 0flew 0 1huge 1 0Indonesia 1 0Minister 0 1Monday 0 1of 1 0on 0 1operation 1 0President 1 0Prime 0 1security 1 0started 1 0the 1 0to 0 1Tony 0 1

18 / 55


Bag–of–words representation / Vector space model


document d1


document d2


word d1 d2


18 / 55




document d1


document d2


word d1 d2


18 / 55




document d1


document d2


word d1 d2


18 / 55


Similarity–based ranking

1. Treat the query as if it were a document: create a query bag–of–terms2. Find the similarity of each document: compute inner product3. Rank order the documents by similarity: most similar to the query first

19 / 55




document d1


document d2


query

Prime Minister

19 / 55




document d1


document d2


query

Prime Minister

word d1 d2


19 / 55




document d1


document d2


query

Prime Minister

word d1 d2 q

a 1 0 0Afghanistan 0 1 0ahead 1 0 0arrival 1 0 0Blair 0 1 0British 0 1 0Bush 1 0 0flew 0 1 0huge 1 0 0Indonesia 1 0 0Minister 0 1 1Monday 0 1 0of 1 0 0on 0 1 0operation 1 0 0President 1 0 0Prime 0 1 1security 1 0 0started 1 0 0the 1 0 0to 0 1 0Tony 0 1 0

19 / 55




document d1


document d2


query

Prime Minister

word d1 d2 q


19 / 55




document d1


document d2


query

Prime Minister

word d1 d2 q


19 / 55


What to use (and not use) as “words”?

SubstringsI overlapping character n–grams

TokensI white–space–delimited word forms

Normalized formsI lemmas, stems

ShinglesI overlapping word n–grams

MultiwordsI collocations, multiword lexemes

StopwordsI too frequent or too rare wordsI closed–class words (prepositions, conjunctions, etc.)

20 / 55


Query expansion

Finds terms that could have been in the query

I synonyms

I other related terms

Blind relevance feedback is widely used

1. search once using the original query

2. find discriminating terms in top–ranked documents

3. add new query terms / reweight existing terms

Several alternative approaches

I thesaurus–based expansion

I collocations

I LSI

21 / 55


Back to Speech Retrieval

22 / 55


Some key insights

Speech Retrieval (recap)

I A special case of IR in which the information is in spoken form

23 / 55


Some key insights



Recognition and retrieval can be decomposed

I Build IR system on ASR output.

23 / 55


Some key insights





Retrieval is robust with recognition results

I Up to 40% word error rate is tolerable (TREC result)

23 / 55


Some key insights







Recognition errors may not bother the system, but they do bother the user

I Retrieval based on ASR output should return playback points.

23 / 55


Some key insights







Recognition errors may not bother the system, but they do bother the user

I Retrieval based on ASR output should return playback points.

Segment–level indexing/summary is usefull

I Vocabulary shift/pauses provide strong cues for boundary tagging

23 / 55


System overview

Data processing Indexing Run–time

AutomaticSearch

BoundaryTagging

InteractiveSelection

ContentTagging

SpeechRecognition

QueryFormulation

credit to Doug Oard 24 / 55


Document processing

SpeechTranscription

25 / 55


Document processing

SpeechTranscription

I

yY

yY

yY

yY

yY

yY

yY

yY

yY

yY

yY

yY

yY

25 / 55


Document processing

SpeechTranscription

IBoundary TaggingContent Tagging

I

yY

yY

yY

yY

yY

yY

yY

yY

yY

yY

yY

yY

yY

25 / 55


Document processing

SpeechTranscription


I

Berlin 1939 Employment

Josef Stein

yY

Berlin 1939


Anna Stein

yY



yY


Maria

yY

25 / 55


Document processing

SpeechTranscription


IDocument

Representation

I

Berlin 1939 Employment

Josef Stein

yY

Berlin 1939


Anna Stein

yY



yY


Maria

yY

I

I

I

I

25 / 55


Spoken document example

Passage from English interview with full annotation

doc no 00009-056150.002

interview data Sidonia L., 1930

name Issac L., Cyla L.

manual keyword family businesses, family life, food, Przemysl (Poland)

summary SL describes her parents and their roles in the family business. Sheremembers her home and she recalls her responsibilities. . . .

asr text were to tell us about that my mother’s name was sell us c y l anew and her maiden name was leap shark l i e b b a c h a r d mymother was a dress . . .

auto keyword family businesses, family homes, means of adaptation and survival,extended family members . . .

26 / 55


Spoken document example

Passage from Czech interview with brief annotation

doc no 01539-025217.003

interview data Alois P., 1927

name -na-yY

manual keyword -na-

summary -na-

asr text a kdyz nejaka ta dıvenka na na a plazi svolavali Stalin ze jo a nunas fotograf prisel uplne sem byl strasne hrdy ze se me rozumela apak sem vedel takze se rıka dva zesılenı to je ze na jedne kde uzpak uz bylo vse rıkalo ze venku koho jste si vzal jak jste seseznamili dobre zustala v Polsku . . .

auto keyword -na-

27 / 55


Evaluation

28 / 55


Evaluation

Question:

I How well do we perform?

29 / 55


Evaluation

Question:


Criteria

I Effectiveness, efficiency, usability?

29 / 55


Evaluation

Question:


Criteria

I Effectiveness, efficiency, usability

29 / 55


Evaluation

Question:


Criteria


User–centered strategy

I Given several users and at least two retrieval systems

I Have each user try the same task on both systems

I Measure which system works the “best”

29 / 55


Evaluation

Question:


Criteria






System–centered strategy

I Given documents, queries, and relevance judgements

I Try several variations on the retrieval systems

I Measure which ranks more good docs near the top

29 / 55


Evaluation

Question:


Criteria






System–centered strategy

I Given documents, queries, and relevance judgements = test collection

I Try several variations on the retrieval systems

I Measure which ranks more good docs near the top

29 / 55


Test collection design

Documents

I Representative sources (interviewees)

I Representative topics (stories)

Topics

I Detailed and structured description of actual information needs

I Used as a basis for query formulation

Relevance judgements

I Document–topic relations (binary relevance)

I Created by humans, perpetually valid

I Sampled, focus on documents likely be retrieved

30 / 55


Document collections (Malach)

English

I 297 interviews

I known–boundary condition (segments)

I 8,104 topically–coherent segments

I average 503 words/segment

I ASR: 25% mean Word Error Rate

Czech

I 350 interviews

I unknown segment boundaries

I 3-minute automatically generated passages, with 67% overlap

I start times used as DOCNO for easy results

I ASR: 35% mean Word Error Rate

Distributed to track participants by ELDA31 / 55


Topic construction

I 115 representative topics developed from actual user requests:

I Scholars, educators, documentary film makers, and others produced250 topic–oriented written requests for materials from the collection.

I from English translated to Czech, French, German, Spanish, and Dutch

I TREC-like topic descriptions consists of title, a short description and anarrative description:

32 / 55


Topic construction





num 1173

title Detske umenı v Terezıne

desc Hledame popis umeleckych aktivit detı v Terezıne, jako napr. hudby,divadla, malovanı, poezie a jinych psanych del.

narr Relevantnı material by mel obsahovat diskuse o techto aktivitach ato, jak ovlivnily preckanı holokaustu a nasledny zivot detı. Zejmenajsou zadane prıbehy, ve kterych ucastnık rozhovoru uvadı prıkladytakovych aktivit.

32 / 55


Topic construction





num 1431

title Dennı zivot v Terezıne

desc Popiste dennı zivot v tabore Terezın.

narr Anekdoty, prıbehy nebo detaily o dennım zivote v tabore. Relevantnıjsou prıbehy, ktere se zminujı o modlitbach, svatcıch, volnem caseveznu a strukturach vnitrnı vlady v taborech.Prıbehy, ktere popisujı neobvykle udalosti v zivote veznu, relevantnınejsou.

32 / 55


Topic construction





num 1431

title Dennı zivot v Terezıne

desc Popiste dennı zivot v tabore Terezın.

narr Anekdoty, prıbehy nebo detaily o dennım zivote v tabore. Relevantnıjsou prıbehy, ktere se zminujı o modlitbach, svatcıch, volnem caseveznu a strukturach vnitrnı vlady v taborech.Prıbehy, ktere popisujı neobvykle udalosti v zivote veznu, relevantnınejsou.

32 / 55


Relevance assessment

A manual process to acquire relevance judgements for document–topic pairs.Ideally for all document–topic pairs – infeasible.

Search guided relevance assessment

I for each topic a set of documents to be judged is restricted to thosepotentially relevant by full–text search

I topic research → query formulation → search → judging

Highly ranked (pooled) relevance assessment

I Restriction based on results of actual evaluation runs

I n-deep pools from m-systems

33 / 55










All

33 / 55










Relevant

All

33 / 55










Relevant

All 1st round

33 / 55










Relevant

All 1st round

2nd round

33 / 55


Relevance categories

Direct Direct evidence for what the user asks forTalks about food given to Auschwitz inmates

Indirect Indirect evidence on the topic; data from which one couldinfer something about the topicTalks about seeing emaciated people in Auschwitz

Context Background/context for the topic; sets the stage for whatthe user asks for; sheds additional light on the topicTalks about physical labor of Auschwitz inmates

Comparison Information on similar/parallel/contrasting situationTalks about food in Warsaw ghetto

34 / 55







Overall Strictly from the point of view of finding out about thetopic, how useful is this segment for the requesterTalks about food in Auschwitz

34 / 55







Overall Strictly from the point of view of finding out about thetopic, how useful is this segment for the requesterTalks about food in Auschwitz

I Five levels of relevance for each (0=none, 4=highly)I Collapsed to binary relevance using {Direct≥2} ∪ {Indirect≥2}

34 / 55


Effectiveness measures

41 / 55



All

41 / 55



All

Relevant

41 / 55



All

Relevant

Retrieved

41 / 55



document retrieved not retrieved

relevant relevant retrieved relevant missed

not relevant false alarm irrelevant rejected

All

Relevant

Retrieved

41 / 55



document retrieved not retrieved

relevant relevant retrieved relevant missed

not relevant false alarm irrelevant rejected

All

Relevant

Retrieved

Precision=|relevant retrieved ||retrieved | Recall=

|relevant retrieved ||relevant |

41 / 55


Evaluation metrics: example

Precision =|relevant retrieved ||retrieved |

Recall =|relevant retrieved |

|relevant |

system results

int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21

42 / 55





|relevant |

system results

int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88 treshold

int.9595-seg.119 0.85 ↙int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21

42 / 55





|relevant |

system results



classification confusiony

int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 0int.9262-seg.131 0int.4188-seg.033 0int.6805-seg.007 0int.2591-seg.123 0int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0

42 / 55





|relevant |

system results





42 / 55





|relevant |

system results





precision recally

100% 50%

42 / 55





|relevant |

system results




precision recally

100% 50%

42 / 55





|relevant |

system results




precision recally

100% 50%80% 50%

42 / 55





|relevant |

system results




precision recally

100% 50%80% 50%

42 / 55





|relevant |

system results




precision recally

100% 50%80% 50%83% 62%

42 / 55





|relevant |

system results




precision recally

100% 50%80% 50%83% 62%

42 / 55





|relevant |

system results




precision recally

100% 50%80% 50%83% 62%85% 75%

42 / 55





|relevant |

system results




precision recally

100% 50%80% 50%83% 62%85% 75%75% 75%

42 / 55





|relevant |

system results




precision recally

100% 50%80% 50%83% 62%85% 75%75% 75%77% 87%

42 / 55





|relevant |

system results




precision recally

100% 50%80% 50%83% 62%85% 75%75% 75%77% 87%70% 87%

42 / 55





|relevant |

system results




precision recally

100% 50%80% 50%83% 62%85% 75%75% 75%77% 87%70% 87%72% 100%

42 / 55





|relevant |

system results




precision recally

100% 12%100% 25%100% 37%100% 50%80% 50%83% 62%85% 75%75% 75%77% 87%70% 87%72% 100%66% 100%61% 100%57% 100%53% 100%50% 100%

42 / 55


Precision–recall curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Precision



Average precision

I the expected value of precision for all possible values of recallI equal to the area under the precision–recall curve (AUC)

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Recall

Inte

rpol

ated

Pre

cisi

on



Average precision

I the expected value of precision for all possible values of recallI equal to the area under the precision–recall curve (AUC)

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0

Recall

Inte

rpol

ated

Pre

cisi

on

Average Precision = 0.477



Mean average precision

0.0

0.2

0.4

0.6

0.8

1.0

1 3 4 5 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24 26 28 29 30 31 32 33 34 36 37 38 39 40

Query

Ave

rage

Pre

cisi

on 0.477

45 / 55



0.0

0.2

0.4

0.6

0.8

1.0

1 3 4 5 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24 26 28 29 30 31 32 33 34 36 37 38 39 40

Query

Ave

rage

Pre

cisi

on

Mean Average Precision = 0.188

0.477

45 / 55



0.0

0.2

0.4

0.6

0.8

1.0

1 3 4 5 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24 26 28 29 30 31 32 33 34 36 37 38 39 40

Query

Ave

rage

Pre

cisi

on

Mean Average Precision = 0.188

0.477

Measuring improvement

I Meaningful improvement: 0.05 is noticable, 0.1 makes a difference

I Reliable improvement: Wilcoxon signed rank test for paired samples

I Maximum precision limit: set by inter–assessors agreement



Cross-Language Evaluation Forum (CLEF)

I an activity of the DELOS Network of Excellence for Digital Libraries underthe Sixth Framework Programme of the European Commission

I develops an infrastructure for testing, tuning, and evaluation of IR systemson European languages in both monolingual and cross–language contexts

I creates test–suites of reusable data which can be employed by systemdevelopers for benchmarking purposes.

I offers a series of evaluation tracks to test different aspects ofcross–language information retrieval system development:

1. Mono–, Bi– and Multilingual Document Retrieval on News Collections

2. Mono and Cross–Language IR on Structured Scientific Data

3. Interactive Cross–Language IR

4. Multiple Language Question Answering

5. Cross–Language Retrieval in Image Collections

6. Multilingual Web Track (WebCLEF)

7. Cross–Language Speech Retrieval

8. Cross–Language Geographical Retrieval

46 / 55


CLEF 2006 Cross–language speech retrieval track overview

Two tasks: English segments, Czech start times

I Max of 5 official runs per team per taskI Baseline English run: ASR / English TD topics

7 teams / 6 countries

I Canada: Ottawa (EN, CZ)

I Czech Rep: West Bohemia (CZ)

I Ireland: DCU (EN)

I Netherlands: Twente (EN)

I Spain: Alicante (EN), UNED (EN)

I USA: Maryland (EN, CZ)

Schedule

Jan 15 Registration openedApr 14 Data releaseMay 1 Topic releaseJun 6 Submission of runs by participantsAug 1 Release of relevance assessments and individual resultsAug 8 Submission of paper for Working Notes

Sep 20 Workshop (Alicante, Spain)

47 / 55


English results (MAP)

Queries: title + description

Documents: ASR output

0.00

0.02

0.04

0.06

0.08

0.10

DCU Ottawa Maryland Twente UNED Alicante

Man U

ninter

polat

ed Av

erage

Prec

ision

48 / 55


English results (MAP)

Queries: title + description

Documents: ASR output

0.00

0.02

0.04

0.06

0.08

0.10

DCU Ottawa Maryland Twente UNED Alicante

Man U

ninter

polat

ed Av

erage

Prec

ision

48 / 55


Monolingual vs. cross–language comparisons (MAP)

Queries: title/description/narrative

Documents: ASR output, metadata, both

Team Q D English French Dutch Spanish

Ottawa TDN Meta 0.2902

Twente T Meta 0.2058 80%

DCU TD Both 0.2015 79%

Maryland TD Meta 0.2350 44%

UNED TD Meta 0.1766 51%

Ottawa TDN ASR 0.0768 83% 81%

DCU TD ASR 0.0733 63%

Twente T ASR 0.0495 77%

UNED TD ASR 0.0376 68%

Maryland TD ASR 0.0543 38%

49 / 55


Conclusion

50 / 55


Call for participation: CLEF 2007

I Registration Opens - 15 January 2007

I Data Release - 15 February 2007

I Topic Release - 1 April 2007

I Submission of Runs by Participants - 15 June 2007

I Release of Relevance Assessments and Individual Results - 15 July 2007

I Submission of Paper for Working Notes - 15 August 2007

I Workshop - 19–21 September 2007

1. Multilingual Document Retrieval on News Collections2. Scientific Data Retrieval3. Interactive Cross–Language Information Retrieval4. Multiple Language Question Answering5. Cross–Language Image Retrieval6. Cross–Language Spoken Retrieval7. CLEF Web Track8. Cross–Language Geographical Information Retrieval

51 / 55


Thank You!

52 / 55


People said ...

Doug Greenberg:

I “We don’t edit any of these interviews. It’s completely raw footage takendirectly from interviews with survivors. It will be broadly accessible, but itwon’t be edited.”

I “Our mission now is to use the archive in educational settings to overcomeprejudice and bigotry.”

Doug Oard:

I “There’s a lot more oral history than anybody even knows about”.

I “It isn’t as good as a human cataloging, but it’s $100 million cheaper.”

I “When you develop this type of technology, you open a lot of doors,”

53 / 55


Links

I Shoah Foundation / Visual History Institutehttp://www.usc.edu/schools/college/vhi/

I Malach Projecthttp://malach.umiacs.umd.edu/

I Cross–Language Evaluation Forumhttp://www.clef-campaign.org/

I Cross–Language Speech Retrieval track of CLEFhttp://clef-clsr.umiacs.umd.edu/

54 / 55

http://www.usc.edu/schools/college/vhi/

http://malach.umiacs.umd.edu/

http://www.clef-campaign.org/

http://clef-clsr.umiacs.umd.edu/