14
PISA Production, Indexing and Search of Audio-visual Material De wiskundige logica achter search en retrieval van audiovisueel materiaal Valérie De Witte, VRT-medialab

search and retrieval of audiovisual material

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: search and retrieval of audiovisual material

PISAProduction, Indexing and Search

of Audio-visual Material

De wiskundige logica achter search en retrieval

van audiovisueel materiaalValérie De Witte, VRT-medialab

Page 2: search and retrieval of audiovisual material

medialab81

Archiving

archiefnummer : ALG 20010813 1

fragmentnummer : 1

reeks : 1000 ZONNEN EN GARNALEN

bandnummer : E03024404

formaat : DBCM

fragmenttitel : 1000 ZONNEN & GARNALEN

beeld : KL/PALPLUS

fragmentduur : 18 20

tekst : 0'00" TOERISTISCH REPORTAGEMAGAZINE OVERZICHT

ONDERWERPEN GENERIEK TOERISTISCH REPORTAGEMAGAZINE,

OVERZICHT ONDERWERPEN

0'50" VANDAAG : KUNSTENAAR LUC HOFKENS ONTWIERP EEN OASE

OP ZIJN DAKTERRAS IN BORGERHOUT DIE DOET DENKEN AAN DE

GRAND CANYON INTERVIEW MET LUC EN ZIJN VROUW

MARILOU BUITENBEELD DAK MET OMGEVING BUITENKANT

ARBEIDERSWONING, PANO OVER ROTSWANDEN, KRATEN MET WATER,

BEPANTING, FOTOALBUM MET VERLOOP WERKEN

4'00" JUNIOR : KLAARTJE ALAERTS, 13 JAAR WIL ASTRONAUTEN

WORDEN ZE BEZOEKT HETEUROSPACE CENTER METRUIMTEVEREN,

RAKETTEN SIMULATIE IN RUIMTEVEER, INTERVIEW, HEEFT EEN

UFO GEZIEN MAAKT ZELF KLEIN RAKETJE, SCHIET HET AF

7'50" DE SCHEURKALENDER : ARCHIEF RECLAMEFILM IBM

INTERVIEW MAURICE DE WILDE, EERSTE PERSOONLIJKECOMPUTER

trefwoorden : BELGIE; BORGERHOUT; ARTIEST; OASE; KUNST; GRAND

CANYON (NATUURGEBIED); DAK; TERRAS; INTERVIEW; EURO

SPACE CENTER; RUIMTEVAART; PC; BOOTTOCHT; RIJKDOM;

PASSAGIER; GASTRONOMIE; RESTAURANT; PERSONEEL;

VAKANTIE; BINNENBEELD; SCHIP; BECKERS LEEN; VRT;

LOTTO; RADIOOMROEPSTER; KLANKSTUDIO; UITVINDING;

BARBECUE; BETONMOLEN; IBM; RECLAMESPOT

rechthebbende : VRT

Opzoekscherm FILM Set: 16 Aantal: 1

blz 1 van 3

trefwoorden: ibm and vrt

archiefnummer: -

uitzendjaar: maand: dag:

fragmentnummer: fragmentduur:

reeks:

formaat: bandnummer:

aflevering: afleveringsnummer:

programma: uitzenddatum:

fragmenttitel:

tekst:

kategorie:

opnamedatum: opnamenummer:

journalist: rechthebbende:

SETS

The strings required for the operation are not defined

F11 F12 F13 F14 F17 F18 F19 F20 Ent

Eindigen Sets Refset Toon Vorige Volg/Leeg Thesaurus Commando Opzoeken

Page 3: search and retrieval of audiovisual material

medialab82

Issues

-> “Annotation” provides structured metadata and

needs to become scalable for the increasing set

of information

-> Automated processing of information is a key

issue, but it requires correct and structured

metadata

-> Product Engineering is the source of structured

and meaningful information

Page 4: search and retrieval of audiovisual material

medialab

Alternative solution

Page 5: search and retrieval of audiovisual material

medialab84

Milestone 1 – Searching Audiovisual Material

Media Asset

Management System

(Ardome)

Search Engine

(Lucene/SOLR)

Search Client

(Custom Development)

Legacy Video Library

(Basisplus)

Actual news items

(Ardome)

Raw Material

(EBU Superpop)

NewsML-G2

Assumptions:

• A “scene” is the logical unit of search

The ideal search engine:

• retrieves all relevant items (recall 100%)

• without false positives (precision 100%)

• provides grouping of similar results

• gives instant access to digital media

• with respect to intellectual property.

Page 6: search and retrieval of audiovisual material

medialab85

Milestone 2 – Computer Assisted Analysis

Media Asset

Management

(Ardome)

! Shot segmentation

! Audio classification

! Face detection

! Face recognition

! Scene detection

! Subtitle processing

! Topic recognition

Shot

Segmentation

Scene

Detection

Face

DetectionTopic

Recognition

Media

Production

Media Asset

Management System

(Ardome)

Search Engine

(Lucene/SOLR)

Legacy Video Library

(Basisplus)

Actual news items

(Ardome)

Raw Material

(EBU Superpop)

NewsML-G2

Page 7: search and retrieval of audiovisual material

medialab86

Search systems

Actual search implementations are excellent in terms of search capabilities

- Boolean logic (AND-, OR- and NOT-operators)

- truncation (plural, stemming, capital letters)

- thesaurus (synonyms, homonyms,…)

- structured metadata and range search

- single word and phrase searching

But… retrieval efficiency

- coverage (composition of the used index, which parts of the documents

that are indexed, update frequency)

- response time (average waiting time between issuing a search

command and displaying the first batch of results on the screen)

- user effort (user-friendly interface)

- output option (number of output options, layout, clarity)

Page 8: search and retrieval of audiovisual material

medialab87

Qualitative evaluation

-> precision = l relevant documents ! retrieved documents l

l retrieved documents l

- fraction of the returned results that are relevant

- requires knowledge of the relevant and non-relevant hits in the

set of retrieved documents

Page 9: search and retrieval of audiovisual material

medialab88

Qualitative evaluation

-> recall = l relevant documents ! retrieved documents l

l relevant documents l

- fraction of the relevant documents in the collection that are

retrieved

- requires knowledge not only of the relevant and retrieved

documents but also of those not retrieved

Page 10: search and retrieval of audiovisual material

medialab89

Qualitative evaluation

! There is often an inverse relationship between precision and recall:

increasing one will reduce the other

! Concerning recall and precision, one is more important than the other in

different use cases

-> in some use cases only the hits on the top of the list have to be

relevant and there is not interest in looking at every document that is

relevant (high precision)

-> in some use cases we like to get the recall as high as possible and

we will tolerate to see low precision results

Page 11: search and retrieval of audiovisual material

medialab

Trouvaille

Precision

Google

Actual Search

Recall

Page 12: search and retrieval of audiovisual material

medialab91

Trouvaille

! Thesaurus application:

! During search: keywords in auto-completion, spellcheck and

synonyms

! User friendly interface:

! Facetted search: programma, genre, journalist

! Different output views: keywords, thumbnails, Google-maps

! Use of a standard NewsML-G2

! Metadata is time-coded

-> Matching keyframe

Page 13: search and retrieval of audiovisual material

medialab92

Trouvaille: future work

! Clustering: integration of copy detection to

find duplicates in the retrieved hits

! Intelligent Information Clustering:Concept

relationships detection

! Feature extraction: Topic detection

! Combination of system quality and user

satisfaction for the evaluation

Recall

Precision

Google

Trouvaille

(MS1)

Feature extraction

Intelligent

Information clustering

Actual Search

100%

100%

Page 14: search and retrieval of audiovisual material

medialab93

Trouvaille