Upload
vrt-medialab
View
2.453
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
PISAProduction, Indexing and Search
of Audio-visual Material
De wiskundige logica achter search en retrieval
van audiovisueel materiaalValérie De Witte, VRT-medialab
medialab81
Archiving
archiefnummer : ALG 20010813 1
fragmentnummer : 1
reeks : 1000 ZONNEN EN GARNALEN
bandnummer : E03024404
formaat : DBCM
fragmenttitel : 1000 ZONNEN & GARNALEN
beeld : KL/PALPLUS
fragmentduur : 18 20
tekst : 0'00" TOERISTISCH REPORTAGEMAGAZINE OVERZICHT
ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE,
OVERZICHT ONDERWERPEN
0'50" VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE
OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE
GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW
MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT
ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER,
BEPANTING, FOTOALBUM MET VERLOOP WERKEN
4'00" JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN
WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN,
RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN
UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF
7'50" DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM
INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER
trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND
CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO
SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM;
PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL;
VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT;
LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING;
BARBECUE; BETONMOLEN; IBM; RECLAMESPOT
rechthebbende : VRT
Opzoekscherm FILM Set: 16 Aantal: 1
blz 1 van 3
trefwoorden: ibm and vrt
archiefnummer: -
uitzendjaar: maand: dag:
fragmentnummer: fragmentduur:
reeks:
formaat: bandnummer:
aflevering: afleveringsnummer:
programma: uitzenddatum:
fragmenttitel:
tekst:
kategorie:
opnamedatum: opnamenummer:
journalist: rechthebbende:
SETS
The strings required for the operation are not defined
F11 F12 F13 F14 F17 F18 F19 F20 Ent
Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken
medialab82
Issues
-> “Annotation” provides structured metadata and
needs to become scalable for the increasing set
of information
-> Automated processing of information is a key
issue, but it requires correct and structured
metadata
-> Product Engineering is the source of structured
and meaningful information
medialab
Alternative solution
medialab84
Milestone 1 – Searching Audiovisual Material
Media Asset
Management System
(Ardome)
Search Engine
(Lucene/SOLR)
Search Client
(Custom Development)
Legacy Video Library
(Basisplus)
Actual news items
(Ardome)
Raw Material
(EBU Superpop)
NewsML-G2
Assumptions:
• A “scene” is the logical unit of search
The ideal search engine:
• retrieves all relevant items (recall 100%)
• without false positives (precision 100%)
• provides grouping of similar results
• gives instant access to digital media
• with respect to intellectual property.
medialab85
Milestone 2 – Computer Assisted Analysis
Media Asset
Management
(Ardome)
! Shot segmentation
! Audio classification
! Face detection
! Face recognition
! Scene detection
! Subtitle processing
! Topic recognition
Shot
Segmentation
Scene
Detection
Face
DetectionTopic
Recognition
Media
Production
Media Asset
Management System
(Ardome)
Search Engine
(Lucene/SOLR)
Legacy Video Library
(Basisplus)
Actual news items
(Ardome)
Raw Material
(EBU Superpop)
NewsML-G2
medialab86
Search systems
Actual search implementations are excellent in terms of search capabilities
- Boolean logic (AND-, OR- and NOT-operators)
- truncation (plural, stemming, capital letters)
- thesaurus (synonyms, homonyms,…)
- structured metadata and range search
- single word and phrase searching
But… retrieval efficiency
- coverage (composition of the used index, which parts of the documents
that are indexed, update frequency)
- response time (average waiting time between issuing a search
command and displaying the first batch of results on the screen)
- user effort (user-friendly interface)
- output option (number of output options, layout, clarity)
medialab87
Qualitative evaluation
-> precision = l relevant documents ! retrieved documents l
l retrieved documents l
- fraction of the returned results that are relevant
- requires knowledge of the relevant and non-relevant hits in the
set of retrieved documents
medialab88
Qualitative evaluation
-> recall = l relevant documents ! retrieved documents l
l relevant documents l
- fraction of the relevant documents in the collection that are
retrieved
- requires knowledge not only of the relevant and retrieved
documents but also of those not retrieved
medialab89
Qualitative evaluation
! There is often an inverse relationship between precision and recall:
increasing one will reduce the other
! Concerning recall and precision, one is more important than the other in
different use cases
-> in some use cases only the hits on the top of the list have to be
relevant and there is not interest in looking at every document that is
relevant (high precision)
-> in some use cases we like to get the recall as high as possible and
we will tolerate to see low precision results
medialab
Trouvaille
Precision
Actual Search
Recall
medialab91
Trouvaille
! Thesaurus application:
! During search: keywords in auto-completion, spellcheck and
synonyms
! User friendly interface:
! Facetted search: programma, genre, journalist
! Different output views: keywords, thumbnails, Google-maps
! Use of a standard NewsML-G2
! Metadata is time-coded
-> Matching keyframe
medialab92
Trouvaille: future work
! Clustering: integration of copy detection to
find duplicates in the retrieved hits
! Intelligent Information Clustering:Concept
relationships detection
! Feature extraction: Topic detection
! Combination of system quality and user
satisfaction for the evaluation
Recall
Precision
Trouvaille
(MS1)
Feature extraction
Intelligent
Information clustering
Actual Search
100%
100%
medialab93
Trouvaille