Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Cross–Language Speech Retrieval and its Evaluationin the Malach Project
Pavel [email protected]
Institute of Formal and Applied LinguisticsCharles Univesity, Prague
Seminar formalnı lingvistikyUFAL, 20. 11. 2006
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Cross–Language Speech Retrieval and its Evaluation
2 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Cross–Language Speech Retrieval and its Evaluation
Information Retrieval
I searching for information in documents or for documents themselves
I searching a body of information for objects that match a search query
I the science and practice of identification and efficient use of recorded data
2 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Cross–Language Speech Retrieval and its Evaluation
Information Retrieval
I searching for information in documents or for documents themselves
I searching a body of information for objects that match a search query
I the science and practice of identification and efficient use of recorded data
Speech Retrieval
I a special case of IR in which the information is in spoken form
2 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Cross–Language Speech Retrieval and its Evaluation
Information Retrieval
I searching for information in documents or for documents themselves
I searching a body of information for objects that match a search query
I the science and practice of identification and efficient use of recorded data
Speech Retrieval
I a special case of IR in which the information is in spoken form
Cross–Language
I retrieving information in a language different from the language of theuser’s query
2 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Cross–Language Speech Retrieval and its Evaluation
Information Retrieval
I searching for information in documents or for documents themselves
I searching a body of information for objects that match a search query
I the science and practice of identification and efficient use of recorded data
Speech Retrieval
I a special case of IR in which the information is in spoken form
Cross–Language
I retrieving information in a language different from the language of theuser’s query
Evaluation
I deals with effectiveness of IR systems: how well they perform
I measures how well users are able to acquire information
I usually comparative: ranks a better system ahead of a worse system
2 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Roadmap
1. Introduction
2. Project OverviewI HistoryI Tasks and Goals
3. Speech RecognitionI Chalanges and ResultsI Demo
4. Speech RetrievalI Overview
5. EvaluationI Test CollectionI Evaluation MeasuresI CLEF 2006
6. Conclusions
3 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Project Overview
4 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
The story begins in 1993
5 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
The story begins in 1993 with a movie
5 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
The story begins in 1993 with a movie and a vision
Steven Spielberg’s vision of:
1. collecting and preservingsurvivor and witness testimonyof the Holocaust
2. cataloging those testimonies tomake them available
3. disseminating the testimoniesfor educational purposes to fightintolerance
4. enabling others to collecttestimonies of other atrocitiesand historical events or perhapsdo so itself
5 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
A brief history of the project
1993 Stephen Spielberg releases Schindler’s List.He is approached by survivors who want him to listen their stories of Holocaust.
1994 Spielberg starts Survivors of the Shoah Visual History Foundationto videotape and preserve testimonies of Holocaust survivors and witnesses.
1999 VHF assembled the world’s largest archive of videotaped oral historieswith interviews from 52,000 survivors, liberators, and rescuers from 57 countries.
2000 10 % interviews manually catalogized by VHF at a cost of $8 million.One single testimony consumes an average of 35 hrs (index, summary, review).
2001 NSF project proposal with the goal to dramatically improve access to largemultilingual spoken word collections. Principal investigators: UMD, JHU, IBM
2001 Grant awarded; project Multilingual Access to Large Spoken Archives launches$7.5 million budget distributed over five years, CU and UWB part of the team.
6 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
The archive
I maintained by Visual History Foundation/Institute
I assembled in 1994–1999 by 2,300 interviewers and 1,000 photographers
I contains testimonies of 52,000 survivors from 57 countries in 32 languages
I total of 116,000 hours of VHS tapes, 180 TB of MPEG-1 digitalized video
I average duration of a testimony 2:15 hours, total cost per interview $2,000
I extensive human cataloging completed for over 3,000 interviews (72 mil words)
I manually indexed (time–aligned descriptors from a 30,000-keyword thesaurus)
7 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
The archive
I maintained by Visual History Foundation/Institute
I assembled in 1994–1999 by 2,300 interviewers and 1,000 photographers
I contains testimonies of 52,000 survivors from 57 countries in 32 languages
I total of 116,000 hours of VHS tapes, 180 TB of MPEG-1 digitalized video
I average duration of a testimony 2:15 hours, total cost per interview $2,000
I extensive human cataloging completed for over 3,000 interviews (72 mil words)
I manually indexed (time–aligned descriptors from a 30,000-keyword thesaurus)
I 573 interviews recorded in the Czech Republic by 38 interviewers
I 4 500 testimonies provided by people born in the Czech Republic
7 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Full–description cataloging and anotations
Interview–level annotationI pre–interview questionaireI names of people and places mentioned in the course ofI free text summary an interview
Segment–level annotationI topic boundaries (average 3 min/segment)I descriptions: summary, cataloguer’s scratchpadI thesaurus labels: names, topic, locations, time periods
Location–Time Concept People
Berlin 1939 Employment Josef Stein
Berlin 1939 Family life Gretchen Stein
Anna Stein
Dresden 1939 Relocation
Transportation–rail
Dresden 1939 Schooling Gunter Wendt
Maria
8 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
“Real–time” cataloging and anotations
Interview–level annotationI pre–interview questionaire
Time–aligned annotationI thesaurus labels: names, concepts, locations, time periods
Location–Time Concept People
Berlin 1939
Employment Josef Stein
yY
Family life Gretchen Stein
yY
Anna Stein
yY
Relocation
Dresden 1939 Transportation–rail
yY
Gunter Wendt
Schooling
Maria
9 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Cataloging interface
10 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Interview languages (top 20)
interview countsy
English 24,872Russian 7,052Hebrew 6,126French 1,875Polish 1,549Spanish 1,352Dutch 1,077Hungarian 1,038German 686Bulgarian 645Slovak 583Czech 573Portuguese 562Yiddish 527Italian 433Serbian 382Croatian 353Ukrainian 320Greek 301Swedish 266
English 49,18%
Russian 13,94%
Polish
Hungarian 2,05%
Slovak 1,15%Czech 1,13%
Other
11 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Interview languages (top 20)
interview countsy
English 24,872Russian 7,052Hebrew 6,126French 1,875Polish 1,549Spanish 1,352Dutch 1,077Hungarian 1,038German 686Bulgarian 645Slovak 583Czech 573Portuguese 562Yiddish 527Italian 433Serbian 382Croatian 353Ukrainian 320Greek 301Swedish 266
English 49,18%
Russian 13,94%
Polish
Hungarian 2,05%
Slovak 1,15%Czech 1,13%
Other
11 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Project tasks and participants
1. automatic recognition of spontaneous speech (multi–lingual)
2. machine supported translation of domain specific thesaurus
3. automatic topic boundary tagging and time–aligned metadata asignment
4. environment for cross–language information retrieval and browsing
IBM T.J. Watson Center, New York- speech recognition in English
Center for Speech and Language Processing, JHU, Baltimore- speech recognition in other languages
- Czech and other Slavic languages subcontracted to CU and UWB
University of Maryland, College Park- archive browsing, information retrieval and its evaluation
- Czech test collection subcontracted to Charles University
12 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech Recognition
13 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Project challenges
I complete speaker independent recognition of spontaneous speech
I relatively high technical quality of recordings
I low “language quality”: difficult even for a human listener:
I spontaneous, emotional, disfluent, and whispered speech from eldersI speech with background noise and frequent interruptionsI heavily accented speech that switches between languagesI speech with words such as names, obscure locations, unknown events
14 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Project challenges
I complete speaker independent recognition of spontaneous speech
I relatively high technical quality of recordings
I low “language quality”: difficult even for a human listener:
I spontaneous, emotional, disfluent, and whispered speech from eldersI speech with background noise and frequent interruptionsI heavily accented speech that switches between languagesI speech with words such as names, obscure locations, unknown events
I specific issue in Czech: colloquial expressions and pronunciation
obed [ o b j e d ] Osvetim [ o s v j e t i m ][ o b j e t ] [ v o s v j e t i m ][ v o b j e d ] [ o s v j e n c i m ][ v o b j e t ] [ o z v j e t i m ]
14 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition results
Word error rate estimated on a sample of manually trascribed dataas a ratio of misrecognized words.
language WER (%)
English 25.0Czech 35.0Russian 45.7Slovak 34.5
15 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition results
Word error rate estimated on a sample of manually trascribed dataas a ratio of misrecognized words.
language WER (%)
English 25.0Czech 35.0Russian 45.7Slovak 34.5
Manual transcriptions
language TrData (h)English 200Czech 84Russian 100Slovak
15 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition results
Word error rate estimated on a sample of manually trascribed dataas a ratio of misrecognized words.
language WER (%)
English 25.0Czech 35.0Russian 45.7Slovak 34.5
Manual transcriptions
language TrData (h)English 200Czech 84Russian 100Slovak
4/02 10/02 4/03 4/04 10/04 3/0515
20
25
30
35
40
45
50
55
60
65
70
English
Czech
Russian
Slovak
Time
WE
R
15 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
Pane Pavle, zacal jste historku o srncıch a tatınkovi bez hvezdy. Jak topokracovalo? Bylo, pokracovalo to tım zpusobem, ze tatınek si sundal hvezdu,pan doktor Jerab mu napsali skupinku na Kladne. To bylo bajecny doktor, tena fandila. Nas tatınek se vydal na cestu na Krivoklatsko, aby upekla ze sem sePochopitelne, ze strejda Prosek s tım nechtel nic mıt. Za to byly kruty tresty, zato se tenkrat popravovalo. Takze strejda Prosek nepytlacil a bal se. Tady vsudev lesıch byli Nemci. Strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na. To byl pes – vlcak, s kterem drıve Prosek nepytlacil,a ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: ∗ ∗ ∗day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
pane pavle zacal jste historku o srncıch a tatınkovi bez hvezdy jak topokracovalo bylo pokracovalo to tım zpusobem ze tatınek si sundal hvezdupan doktor jerab mu napsali skupinku na kladne to bylo bajecny doktor tena fandila nas tatınek se vydal na cestu na krivoklatsko aby upekla ze sem se ...pochopitelne ze strejda prosek s tım nechtel nic mıt za to byly kruty tresty zato se tenkrat popravovalo takze strejda prosek nepytlacil a bal se tady vsudev lesıch byli nemci strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na to byl pes vlcak s kterem drıve prosek nepytlacila ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Speech recognition demo
name: Hugo Pavel
day of birth: Dec 26, 1924
country: Czechoslovakia
religion: judaism
keywords: hiding/death marchesunderground/resistance
pane pavle zacal jste historku o srncıch a tatınkovi bez hvezdy jak topokracovalo bylo pokracovalo to tım zpusobem ze tatınek si sundal hvezdupan doktor jerab mu napsali skupinku na kladne to bylo bajecny doktor tena fandila nas tatınek se vydal na cestu na krivoklatsko aby upekla ze sem se ...pochopitelne ze strejda prosek s tım nechtel nic mıt za to byly kruty tresty zato se tenkrat popravovalo takze strejda prosek nepytlacil a bal se tady vsudev lesıch byli nemci strılelo se ... a nas tata se vydal na tuhle cestu a ubytovalse mnou sluvko toho v roce sem opravdu podarilo u pytlacit – za pomocilegendarnı a volal na to byl pes vlcak s kterem drıve prosek nepytlacila ten proste kazdeho sem se nepytlacil ...
16 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Information Retrieval
17 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Bag–of–words representation
I Simple strategy for representing documentsI Count how many times each word occurs (regardless of word order)I Distribution over fixed vocabulary
18 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Bag–of–words representation
I Simple strategy for representing documentsI Count how many times each word occurs (regardless of word order)I Distribution over fixed vocabulary
document d1
Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush
document d2
British PrimeMinister Tony Blairflew to Afghanistanon Monday
18 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Bag–of–words representation
I Simple strategy for representing documentsI Count how many times each word occurs (regardless of word order)I Distribution over fixed vocabulary
document d1
Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush
document d2
British PrimeMinister Tony Blairflew to Afghanistanon Monday
word d1
a 1Afghanistan 0ahead 1arrival 1Blair 0British 0Bush 1flew 0huge 1Indonesia 1Minister 0Monday 0of 1on 0operation 1President 1Prime 0security 1started 1the 1to 0Tony 0
18 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Bag–of–words representation
I Simple strategy for representing documentsI Count how many times each word occurs (regardless of word order)I Distribution over fixed vocabulary
document d1
Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush
document d2
British PrimeMinister Tony Blairflew to Afghanistanon Monday
word d1 d2
a 1 0Afghanistan 0 1ahead 1 0arrival 1 0Blair 0 1British 0 1Bush 1 0flew 0 1huge 1 0Indonesia 1 0Minister 0 1Monday 0 1of 1 0on 0 1operation 1 0President 1 0Prime 0 1security 1 0started 1 0the 1 0to 0 1Tony 0 1
18 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Bag–of–words representation / Vector space model
I Simple strategy for representing documentsI Count how many times each word occurs (regardless of word order)I Distribution over fixed vocabulary
document d1
Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush
document d2
British PrimeMinister Tony Blairflew to Afghanistanon Monday
word d1 d2
a 1 0Afghanistan 0 1ahead 1 0arrival 1 0Blair 0 1British 0 1Bush 1 0flew 0 1huge 1 0Indonesia 1 0Minister 0 1Monday 0 1of 1 0on 0 1operation 1 0President 1 0Prime 0 1security 1 0started 1 0the 1 0to 0 1Tony 0 1
18 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Bag–of–words representation / Vector space model
I Simple strategy for representing documentsI Count how many times each word occurs (regardless of word order)I Distribution over fixed vocabulary
document d1
Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush
document d2
British PrimeMinister Tony Blairflew to Afghanistanon Monday
word d1 d2
a 1 0Afghanistan 0 1ahead 1 0arrival 1 0Blair 0 1British 0 1Bush 1 0flew 0 1huge 1 0Indonesia 1 0Minister 0 1Monday 0 1of 1 0on 0 1operation 1 0President 1 0Prime 0 1security 1 0started 1 0the 1 0to 0 1Tony 0 1
18 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Bag–of–words representation / Vector space model
I Simple strategy for representing documentsI Count how many times each word occurs (regardless of word order)I Distribution over fixed vocabulary
document d1
Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush
document d2
British PrimeMinister Tony Blairflew to Afghanistanon Monday
word d1 d2
a 1 0Afghanistan 0 1ahead 1 0arrival 1 0Blair 0 1British 0 1Bush 1 0flew 0 1huge 1 0Indonesia 1 0Minister 0 1Monday 0 1of 1 0on 0 1operation 1 0President 1 0Prime 0 1security 1 0started 1 0the 1 0to 0 1Tony 0 1
18 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Similarity–based ranking
1. Treat the query as if it were a document: create a query bag–of–terms2. Find the similarity of each document: compute inner product3. Rank order the documents by similarity: most similar to the query first
19 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Similarity–based ranking
1. Treat the query as if it were a document: create a query bag–of–terms2. Find the similarity of each document: compute inner product3. Rank order the documents by similarity: most similar to the query first
document d1
Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush
document d2
British PrimeMinister Tony Blairflew to Afghanistanon Monday
query
Prime Minister
19 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Similarity–based ranking
1. Treat the query as if it were a document: create a query bag–of–terms2. Find the similarity of each document: compute inner product3. Rank order the documents by similarity: most similar to the query first
document d1
Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush
document d2
British PrimeMinister Tony Blairflew to Afghanistanon Monday
query
Prime Minister
word d1 d2
a 1 0Afghanistan 0 1ahead 1 0arrival 1 0Blair 0 1British 0 1Bush 1 0flew 0 1huge 1 0Indonesia 1 0Minister 0 1Monday 0 1of 1 0on 0 1operation 1 0President 1 0Prime 0 1security 1 0started 1 0the 1 0to 0 1Tony 0 1
19 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Similarity–based ranking
1. Treat the query as if it were a document: create a query bag–of–terms2. Find the similarity of each document: compute inner product3. Rank order the documents by similarity: most similar to the query first
document d1
Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush
document d2
British PrimeMinister Tony Blairflew to Afghanistanon Monday
query
Prime Minister
word d1 d2 q
a 1 0 0Afghanistan 0 1 0ahead 1 0 0arrival 1 0 0Blair 0 1 0British 0 1 0Bush 1 0 0flew 0 1 0huge 1 0 0Indonesia 1 0 0Minister 0 1 1Monday 0 1 0of 1 0 0on 0 1 0operation 1 0 0President 1 0 0Prime 0 1 1security 1 0 0started 1 0 0the 1 0 0to 0 1 0Tony 0 1 0
19 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Similarity–based ranking
1. Treat the query as if it were a document: create a query bag–of–terms2. Find the similarity of each document: compute inner product3. Rank order the documents by similarity: most similar to the query first
document d1
Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush
document d2
British PrimeMinister Tony Blairflew to Afghanistanon Monday
query
Prime Minister
word d1 d2 q
a 1 0 0Afghanistan 0 1 0ahead 1 0 0arrival 1 0 0Blair 0 1 0British 0 1 0Bush 1 0 0flew 0 1 0huge 1 0 0Indonesia 1 0 0Minister 0 1 1Monday 0 1 0of 1 0 0on 0 1 0operation 1 0 0President 1 0 0Prime 0 1 1security 1 0 0started 1 0 0the 1 0 0to 0 1 0Tony 0 1 0
19 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Similarity–based ranking
1. Treat the query as if it were a document: create a query bag–of–terms2. Find the similarity of each document: compute inner product3. Rank order the documents by similarity: most similar to the query first
document d1
Indonesia started ahuge securityoperation ahead ofthe arrival ofPresident Bush
document d2
British PrimeMinister Tony Blairflew to Afghanistanon Monday
query
Prime Minister
word d1 d2 q
a 1 0 0Afghanistan 0 1 0ahead 1 0 0arrival 1 0 0Blair 0 1 0British 0 1 0Bush 1 0 0flew 0 1 0huge 1 0 0Indonesia 1 0 0Minister 0 1 1Monday 0 1 0of 1 0 0on 0 1 0operation 1 0 0President 1 0 0Prime 0 1 1security 1 0 0started 1 0 0the 1 0 0to 0 1 0Tony 0 1 0
19 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
What to use (and not use) as “words”?
SubstringsI overlapping character n–grams
TokensI white–space–delimited word forms
Normalized formsI lemmas, stems
ShinglesI overlapping word n–grams
MultiwordsI collocations, multiword lexemes
StopwordsI too frequent or too rare wordsI closed–class words (prepositions, conjunctions, etc.)
20 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Query expansion
Finds terms that could have been in the query
I synonyms
I other related terms
Blind relevance feedback is widely used
1. search once using the original query
2. find discriminating terms in top–ranked documents
3. add new query terms / reweight existing terms
Several alternative approaches
I thesaurus–based expansion
I collocations
I LSI
21 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Back to Speech Retrieval
22 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Some key insights
Speech Retrieval (recap)
I A special case of IR in which the information is in spoken form
23 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Some key insights
Speech Retrieval (recap)
I A special case of IR in which the information is in spoken form
Recognition and retrieval can be decomposed
I Build IR system on ASR output.
23 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Some key insights
Speech Retrieval (recap)
I A special case of IR in which the information is in spoken form
Recognition and retrieval can be decomposed
I Build IR system on ASR output.
Retrieval is robust with recognition results
I Up to 40% word error rate is tolerable (TREC result)
23 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Some key insights
Speech Retrieval (recap)
I A special case of IR in which the information is in spoken form
Recognition and retrieval can be decomposed
I Build IR system on ASR output.
Retrieval is robust with recognition results
I Up to 40% word error rate is tolerable (TREC result)
Recognition errors may not bother the system, but they do bother the user
I Retrieval based on ASR output should return playback points.
23 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Some key insights
Speech Retrieval (recap)
I A special case of IR in which the information is in spoken form
Recognition and retrieval can be decomposed
I Build IR system on ASR output.
Retrieval is robust with recognition results
I Up to 40% word error rate is tolerable (TREC result)
Recognition errors may not bother the system, but they do bother the user
I Retrieval based on ASR output should return playback points.
Segment–level indexing/summary is usefull
I Vocabulary shift/pauses provide strong cues for boundary tagging
23 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
System overview
Data processing Indexing Run–time
AutomaticSearch
BoundaryTagging
InteractiveSelection
ContentTagging
SpeechRecognition
QueryFormulation
credit to Doug Oard 24 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Document processing
SpeechTranscription
25 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Document processing
SpeechTranscription
I
yY
yY
yY
yY
yY
yY
yY
yY
yY
yY
yY
yY
yY
25 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Document processing
SpeechTranscription
IBoundary TaggingContent Tagging
I
yY
yY
yY
yY
yY
yY
yY
yY
yY
yY
yY
yY
yY
25 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Document processing
SpeechTranscription
IBoundary TaggingContent Tagging
I
Berlin 1939 Employment
Josef Stein
yY
Berlin 1939
Family life Gretchen Stein
Anna Stein
yY
Dresden 1939 Relocation
Transportation–rail
yY
Dresden 1939 Schooling Gunter Wendt
Maria
yY
25 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Document processing
SpeechTranscription
IBoundary TaggingContent Tagging
IDocument
Representation
I
Berlin 1939 Employment
Josef Stein
yY
Berlin 1939
Family life Gretchen Stein
Anna Stein
yY
Dresden 1939 Relocation
Transportation–rail
yY
Dresden 1939 Schooling Gunter Wendt
Maria
yY
I
I
I
I
25 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Spoken document example
Passage from English interview with full annotation
doc no 00009-056150.002
interview data Sidonia L., 1930
name Issac L., Cyla L.
manual keyword family businesses, family life, food, Przemysl (Poland)
summary SL describes her parents and their roles in the family business. Sheremembers her home and she recalls her responsibilities. . . .
asr text were to tell us about that my mother’s name was sell us c y l anew and her maiden name was leap shark l i e b b a c h a r d mymother was a dress . . .
auto keyword family businesses, family homes, means of adaptation and survival,extended family members . . .
26 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Spoken document example
Passage from Czech interview with brief annotation
doc no 01539-025217.003
interview data Alois P., 1927
name -na-yY
manual keyword -na-
summary -na-
asr text a kdyz nejaka ta dıvenka na na a plazi svolavali Stalin ze jo a nunas fotograf prisel uplne sem byl strasne hrdy ze se me rozumela apak sem vedel takze se rıka dva zesılenı to je ze na jedne kde uzpak uz bylo vse rıkalo ze venku koho jste si vzal jak jste seseznamili dobre zustala v Polsku . . .
auto keyword -na-
27 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation
28 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation
Question:
I How well do we perform?
29 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation
Question:
I How well do we perform?
Criteria
I Effectiveness, efficiency, usability?
29 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation
Question:
I How well do we perform?
Criteria
I Effectiveness, efficiency, usability
29 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation
Question:
I How well do we perform?
Criteria
I Effectiveness, efficiency, usability
User–centered strategy
I Given several users and at least two retrieval systems
I Have each user try the same task on both systems
I Measure which system works the “best”
29 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation
Question:
I How well do we perform?
Criteria
I Effectiveness, efficiency, usability
User–centered strategy
I Given several users and at least two retrieval systems
I Have each user try the same task on both systems
I Measure which system works the “best”
System–centered strategy
I Given documents, queries, and relevance judgements
I Try several variations on the retrieval systems
I Measure which ranks more good docs near the top
29 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation
Question:
I How well do we perform?
Criteria
I Effectiveness, efficiency, usability
User–centered strategy
I Given several users and at least two retrieval systems
I Have each user try the same task on both systems
I Measure which system works the “best”
System–centered strategy
I Given documents, queries, and relevance judgements = test collection
I Try several variations on the retrieval systems
I Measure which ranks more good docs near the top
29 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Test collection design
Documents
I Representative sources (interviewees)
I Representative topics (stories)
Topics
I Detailed and structured description of actual information needs
I Used as a basis for query formulation
Relevance judgements
I Document–topic relations (binary relevance)
I Created by humans, perpetually valid
I Sampled, focus on documents likely be retrieved
30 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Document collections (Malach)
English
I 297 interviews
I known–boundary condition (segments)
I 8,104 topically–coherent segments
I average 503 words/segment
I ASR: 25% mean Word Error Rate
Czech
I 350 interviews
I unknown segment boundaries
I 3-minute automatically generated passages, with 67% overlap
I start times used as DOCNO for easy results
I ASR: 35% mean Word Error Rate
Distributed to track participants by ELDA31 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Topic construction
I 115 representative topics developed from actual user requests:
I Scholars, educators, documentary film makers, and others produced250 topic–oriented written requests for materials from the collection.
I from English translated to Czech, French, German, Spanish, and Dutch
I TREC-like topic descriptions consists of title, a short description and anarrative description:
32 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Topic construction
I 115 representative topics developed from actual user requests:
I Scholars, educators, documentary film makers, and others produced250 topic–oriented written requests for materials from the collection.
I from English translated to Czech, French, German, Spanish, and Dutch
I TREC-like topic descriptions consists of title, a short description and anarrative description:
num 1173
title Detske umenı v Terezıne
desc Hledame popis umeleckych aktivit detı v Terezıne, jako napr. hudby,divadla, malovanı, poezie a jinych psanych del.
narr Relevantnı material by mel obsahovat diskuse o techto aktivitach ato, jak ovlivnily preckanı holokaustu a nasledny zivot detı. Zejmenajsou zadane prıbehy, ve kterych ucastnık rozhovoru uvadı prıkladytakovych aktivit.
32 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Topic construction
I 115 representative topics developed from actual user requests:
I Scholars, educators, documentary film makers, and others produced250 topic–oriented written requests for materials from the collection.
I from English translated to Czech, French, German, Spanish, and Dutch
I TREC-like topic descriptions consists of title, a short description and anarrative description:
num 1431
title Dennı zivot v Terezıne
desc Popiste dennı zivot v tabore Terezın.
narr Anekdoty, prıbehy nebo detaily o dennım zivote v tabore. Relevantnıjsou prıbehy, ktere se zminujı o modlitbach, svatcıch, volnem caseveznu a strukturach vnitrnı vlady v taborech.Prıbehy, ktere popisujı neobvykle udalosti v zivote veznu, relevantnınejsou.
32 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Topic construction
I 115 representative topics developed from actual user requests:
I Scholars, educators, documentary film makers, and others produced250 topic–oriented written requests for materials from the collection.
I from English translated to Czech, French, German, Spanish, and Dutch
I TREC-like topic descriptions consists of title, a short description and anarrative description:
num 1431
title Dennı zivot v Terezıne
desc Popiste dennı zivot v tabore Terezın.
narr Anekdoty, prıbehy nebo detaily o dennım zivote v tabore. Relevantnıjsou prıbehy, ktere se zminujı o modlitbach, svatcıch, volnem caseveznu a strukturach vnitrnı vlady v taborech.Prıbehy, ktere popisujı neobvykle udalosti v zivote veznu, relevantnınejsou.
32 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Relevance assessment
A manual process to acquire relevance judgements for document–topic pairs.Ideally for all document–topic pairs – infeasible.
Search guided relevance assessment
I for each topic a set of documents to be judged is restricted to thosepotentially relevant by full–text search
I topic research → query formulation → search → judging
Highly ranked (pooled) relevance assessment
I Restriction based on results of actual evaluation runs
I n-deep pools from m-systems
33 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Relevance assessment
A manual process to acquire relevance judgements for document–topic pairs.Ideally for all document–topic pairs – infeasible.
Search guided relevance assessment
I for each topic a set of documents to be judged is restricted to thosepotentially relevant by full–text search
I topic research → query formulation → search → judging
Highly ranked (pooled) relevance assessment
I Restriction based on results of actual evaluation runs
I n-deep pools from m-systems
All
33 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Relevance assessment
A manual process to acquire relevance judgements for document–topic pairs.Ideally for all document–topic pairs – infeasible.
Search guided relevance assessment
I for each topic a set of documents to be judged is restricted to thosepotentially relevant by full–text search
I topic research → query formulation → search → judging
Highly ranked (pooled) relevance assessment
I Restriction based on results of actual evaluation runs
I n-deep pools from m-systems
Relevant
All
33 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Relevance assessment
A manual process to acquire relevance judgements for document–topic pairs.Ideally for all document–topic pairs – infeasible.
Search guided relevance assessment
I for each topic a set of documents to be judged is restricted to thosepotentially relevant by full–text search
I topic research → query formulation → search → judging
Highly ranked (pooled) relevance assessment
I Restriction based on results of actual evaluation runs
I n-deep pools from m-systems
Relevant
All 1st round
33 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Relevance assessment
A manual process to acquire relevance judgements for document–topic pairs.Ideally for all document–topic pairs – infeasible.
Search guided relevance assessment
I for each topic a set of documents to be judged is restricted to thosepotentially relevant by full–text search
I topic research → query formulation → search → judging
Highly ranked (pooled) relevance assessment
I Restriction based on results of actual evaluation runs
I n-deep pools from m-systems
Relevant
All 1st round
2nd round
33 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Relevance categories
Direct Direct evidence for what the user asks forTalks about food given to Auschwitz inmates
Indirect Indirect evidence on the topic; data from which one couldinfer something about the topicTalks about seeing emaciated people in Auschwitz
Context Background/context for the topic; sets the stage for whatthe user asks for; sheds additional light on the topicTalks about physical labor of Auschwitz inmates
Comparison Information on similar/parallel/contrasting situationTalks about food in Warsaw ghetto
34 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Relevance categories
Direct Direct evidence for what the user asks forTalks about food given to Auschwitz inmates
Indirect Indirect evidence on the topic; data from which one couldinfer something about the topicTalks about seeing emaciated people in Auschwitz
Context Background/context for the topic; sets the stage for whatthe user asks for; sheds additional light on the topicTalks about physical labor of Auschwitz inmates
Comparison Information on similar/parallel/contrasting situationTalks about food in Warsaw ghetto
Overall Strictly from the point of view of finding out about thetopic, how useful is this segment for the requesterTalks about food in Auschwitz
34 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Relevance categories
Direct Direct evidence for what the user asks forTalks about food given to Auschwitz inmates
Indirect Indirect evidence on the topic; data from which one couldinfer something about the topicTalks about seeing emaciated people in Auschwitz
Context Background/context for the topic; sets the stage for whatthe user asks for; sheds additional light on the topicTalks about physical labor of Auschwitz inmates
Comparison Information on similar/parallel/contrasting situationTalks about food in Warsaw ghetto
Overall Strictly from the point of view of finding out about thetopic, how useful is this segment for the requesterTalks about food in Auschwitz
I Five levels of relevance for each (0=none, 4=highly)I Collapsed to binary relevance using {Direct≥2} ∪ {Indirect≥2}
34 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Effectiveness measures
41 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Effectiveness measures
All
41 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Effectiveness measures
All
Relevant
41 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Effectiveness measures
All
Relevant
Retrieved
41 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Effectiveness measures
document retrieved not retrieved
relevant relevant retrieved relevant missed
not relevant false alarm irrelevant rejected
All
Relevant
Retrieved
41 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Effectiveness measures
document retrieved not retrieved
relevant relevant retrieved relevant missed
not relevant false alarm irrelevant rejected
All
Relevant
Retrieved
Precision=|relevant retrieved ||retrieved | Recall=
|relevant retrieved ||relevant |
41 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88 treshold
int.9595-seg.119 0.85 ↙int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88 treshold
int.9595-seg.119 0.85 ↙int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 0int.9262-seg.131 0int.4188-seg.033 0int.6805-seg.007 0int.2591-seg.123 0int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88 treshold
int.9595-seg.119 0.85 ↙int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 0int.9262-seg.131 0int.4188-seg.033 0int.6805-seg.007 0int.2591-seg.123 0int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88 treshold
int.9595-seg.119 0.85 ↙int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 0int.9262-seg.131 0int.4188-seg.033 0int.6805-seg.007 0int.2591-seg.123 0int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
precision recally
100% 50%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 1int.9262-seg.131 0int.4188-seg.033 0int.6805-seg.007 0int.2591-seg.123 0int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
precision recally
100% 50%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 1int.9262-seg.131 0int.4188-seg.033 0int.6805-seg.007 0int.2591-seg.123 0int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
precision recally
100% 50%80% 50%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 1int.9262-seg.131 1int.4188-seg.033 0int.6805-seg.007 0int.2591-seg.123 0int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
precision recally
100% 50%80% 50%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 1int.9262-seg.131 1int.4188-seg.033 0int.6805-seg.007 0int.2591-seg.123 0int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
precision recally
100% 50%80% 50%83% 62%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 1int.9262-seg.131 1int.4188-seg.033 1int.6805-seg.007 0int.2591-seg.123 0int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
precision recally
100% 50%80% 50%83% 62%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 1int.9262-seg.131 1int.4188-seg.033 1int.6805-seg.007 0int.2591-seg.123 0int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
precision recally
100% 50%80% 50%83% 62%85% 75%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 1int.9262-seg.131 1int.4188-seg.033 1int.6805-seg.007 1int.2591-seg.123 0int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
precision recally
100% 50%80% 50%83% 62%85% 75%75% 75%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 1int.9262-seg.131 1int.4188-seg.033 1int.6805-seg.007 1int.2591-seg.123 1int.5325-seg.015 0int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
precision recally
100% 50%80% 50%83% 62%85% 75%75% 75%77% 87%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 1int.9262-seg.131 1int.4188-seg.033 1int.6805-seg.007 1int.2591-seg.123 1int.5325-seg.015 1int.6042-seg.038 0int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
precision recally
100% 50%80% 50%83% 62%85% 75%75% 75%77% 87%70% 87%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.96int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 1int.9262-seg.131 1int.4188-seg.033 1int.6805-seg.007 1int.2591-seg.123 1int.5325-seg.015 1int.6042-seg.038 1int.1066-seg.149 0int.3215-seg.071 0int.4404-seg.023 0int.2012-seg.121 0int.6707-seg.116 0
precision recally
100% 50%80% 50%83% 62%85% 75%75% 75%77% 87%70% 87%72% 100%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Evaluation metrics: example
Precision =|relevant retrieved ||retrieved |
Recall =|relevant retrieved |
|relevant |
system results
int.3431-seg.125 0.91int.7264-seg.068 0.93int.3012-seg.142 0.88int.9595-seg.119 0.85int.2306-seg.120 0.79int.9262-seg.131 0.75int.4188-seg.033 0.73int.6805-seg.007 0.65int.2591-seg.123 0.62int.5325-seg.015 0.61int.6042-seg.038 0.55int.1066-seg.149 0.54int.3215-seg.071 0.52int.4404-seg.023 0.45int.2012-seg.121 0.34int.6707-seg.116 0.21
classification confusiony
int.3431-seg.125 1int.7264-seg.068 1int.3012-seg.142 1int.9595-seg.119 1int.2306-seg.120 1int.9262-seg.131 1int.4188-seg.033 1int.6805-seg.007 1int.2591-seg.123 1int.5325-seg.015 1int.6042-seg.038 1int.1066-seg.149 1int.3215-seg.071 1int.4404-seg.023 1int.2012-seg.121 1int.6707-seg.116 1
precision recally
100% 12%100% 25%100% 37%100% 50%80% 50%83% 62%85% 75%75% 75%77% 87%70% 87%72% 100%66% 100%61% 100%57% 100%53% 100%50% 100%
42 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Precision–recall curves
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Precision
credit to Doug Oard 43 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Average precision
I the expected value of precision for all possible values of recallI equal to the area under the precision–recall curve (AUC)
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Recall
Inte
rpol
ated
Pre
cisi
on
credit to Doug Oard 44 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Average precision
I the expected value of precision for all possible values of recallI equal to the area under the precision–recall curve (AUC)
0.0
0.2
0.4
0.6
0.8
1.0
0.0 0.2 0.4 0.6 0.8 1.0
Recall
Inte
rpol
ated
Pre
cisi
on
Average Precision = 0.477
credit to Doug Oard 44 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Mean average precision
0.0
0.2
0.4
0.6
0.8
1.0
1 3 4 5 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24 26 28 29 30 31 32 33 34 36 37 38 39 40
Query
Ave
rage
Pre
cisi
on 0.477
45 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Mean average precision
0.0
0.2
0.4
0.6
0.8
1.0
1 3 4 5 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24 26 28 29 30 31 32 33 34 36 37 38 39 40
Query
Ave
rage
Pre
cisi
on
Mean Average Precision = 0.188
0.477
45 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Mean average precision
0.0
0.2
0.4
0.6
0.8
1.0
1 3 4 5 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 24 26 28 29 30 31 32 33 34 36 37 38 39 40
Query
Ave
rage
Pre
cisi
on
Mean Average Precision = 0.188
0.477
Measuring improvement
I Meaningful improvement: 0.05 is noticable, 0.1 makes a difference
I Reliable improvement: Wilcoxon signed rank test for paired samples
I Maximum precision limit: set by inter–assessors agreement
credit to Doug Oard 45 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Cross-Language Evaluation Forum (CLEF)
I an activity of the DELOS Network of Excellence for Digital Libraries underthe Sixth Framework Programme of the European Commission
I develops an infrastructure for testing, tuning, and evaluation of IR systemson European languages in both monolingual and cross–language contexts
I creates test–suites of reusable data which can be employed by systemdevelopers for benchmarking purposes.
I offers a series of evaluation tracks to test different aspects ofcross–language information retrieval system development:
1. Mono–, Bi– and Multilingual Document Retrieval on News Collections
2. Mono and Cross–Language IR on Structured Scientific Data
3. Interactive Cross–Language IR
4. Multiple Language Question Answering
5. Cross–Language Retrieval in Image Collections
6. Multilingual Web Track (WebCLEF)
7. Cross–Language Speech Retrieval
8. Cross–Language Geographical Retrieval
46 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
CLEF 2006 Cross–language speech retrieval track overview
Two tasks: English segments, Czech start times
I Max of 5 official runs per team per taskI Baseline English run: ASR / English TD topics
7 teams / 6 countries
I Canada: Ottawa (EN, CZ)
I Czech Rep: West Bohemia (CZ)
I Ireland: DCU (EN)
I Netherlands: Twente (EN)
I Spain: Alicante (EN), UNED (EN)
I USA: Maryland (EN, CZ)
Schedule
Jan 15 Registration openedApr 14 Data releaseMay 1 Topic releaseJun 6 Submission of runs by participantsAug 1 Release of relevance assessments and individual resultsAug 8 Submission of paper for Working Notes
Sep 20 Workshop (Alicante, Spain)
47 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
English results (MAP)
Queries: title + description
Documents: ASR output
0.00
0.02
0.04
0.06
0.08
0.10
DCU Ottawa Maryland Twente UNED Alicante
Man U
ninter
polat
ed Av
erage
Prec
ision
48 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
English results (MAP)
Queries: title + description
Documents: ASR output
0.00
0.02
0.04
0.06
0.08
0.10
DCU Ottawa Maryland Twente UNED Alicante
Man U
ninter
polat
ed Av
erage
Prec
ision
48 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Monolingual vs. cross–language comparisons (MAP)
Queries: title/description/narrative
Documents: ASR output, metadata, both
Team Q D English French Dutch Spanish
Ottawa TDN Meta 0.2902
Twente T Meta 0.2058 80%
DCU TD Both 0.2015 79%
Maryland TD Meta 0.2350 44%
UNED TD Meta 0.1766 51%
Ottawa TDN ASR 0.0768 83% 81%
DCU TD ASR 0.0733 63%
Twente T ASR 0.0495 77%
UNED TD ASR 0.0376 68%
Maryland TD ASR 0.0543 38%
49 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Conclusion
50 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Call for participation: CLEF 2007
I Registration Opens - 15 January 2007
I Data Release - 15 February 2007
I Topic Release - 1 April 2007
I Submission of Runs by Participants - 15 June 2007
I Release of Relevance Assessments and Individual Results - 15 July 2007
I Submission of Paper for Working Notes - 15 August 2007
I Workshop - 19–21 September 2007
1. Multilingual Document Retrieval on News Collections2. Scientific Data Retrieval3. Interactive Cross–Language Information Retrieval4. Multiple Language Question Answering5. Cross–Language Image Retrieval6. Cross–Language Spoken Retrieval7. CLEF Web Track8. Cross–Language Geographical Information Retrieval
51 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Thank You!
52 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
People said ...
Doug Greenberg:
I “We don’t edit any of these interviews. It’s completely raw footage takendirectly from interviews with survivors. It will be broadly accessible, but itwon’t be edited.”
I “Our mission now is to use the archive in educational settings to overcomeprejudice and bigotry.”
Doug Oard:
I “There’s a lot more oral history than anybody even knows about”.
I “It isn’t as good as a human cataloging, but it’s $100 million cheaper.”
I “When you develop this type of technology, you open a lot of doors,”
53 / 55
Introduction Project Overview Speech Recognition Speech Retrieval Retrieval Evaluation Conclusion
Links
I Shoah Foundation / Visual History Institutehttp://www.usc.edu/schools/college/vhi/
I Malach Projecthttp://malach.umiacs.umd.edu/
I Cross–Language Evaluation Forumhttp://www.clef-campaign.org/
I Cross–Language Speech Retrieval track of CLEFhttp://clef-clsr.umiacs.umd.edu/
54 / 55