Crowdsourcing metadata for audiovisual collections

Crowdsourcing metadatafor audiovisual collections

from free tekst tags to semantic concepts

Lotte Belice Baltussen – Sound and Vision

7 December 2011 | DISH

Waisda? What’s that?

Allows people to annotate audiovisual archive material

in the form of a game.

4

• Time-related metadata• Social tagging (bridging the semantic gap)• Interaction between the archive /broadcaster and

the public• Gathering data for further research

• Efficiency?annotating video takes up to 5 x the length of the video

• New business model?

Added value

• Netherlands Institute for Sound and Vision (project management, content, research)

• KRO (concept, content, PR)• VU (research within PrestoPRIME)• Q42 (developer)

Project partners pilot

Man bijt hond Woordentikkertje

After evaluation:• Improved interface• New scoring mechanisms

(semantics)• New content• More feedback

How does it work?

Players choose from ‘channels’ with different episodes

How does it work?Scoring:• Basic rule – players score

points when their tag exactly matches the tag entered by another player within 10 seconds• Multiple other scoring

mechanisms to create various tag incentives

Scoring as filter

Evaluation

Martorrel

Generating a constant flow of traffic is a challenge! Important: Partners, publicity on external websites with relevant communities and a large number of visitors.

Example FWAW, in one week:

• Triple # of tags to 160.000

• Double # of registered players to 362

Outcomes

• Matches in Waisda? • Matches GTAA / Cornetto

• Stats

• 340,551 tags added to 604 items, 42,068 unique tags• 39.134 pageviews, 555 registered players, 10,926 visits• Average playing time 6min45, 4.287 sessions

Evaluationav-documentalist

Evaluationav-documentalist

• Tags mostly describe short fragments and are often not very specific. They don’t describe a programme as a whole.

• BUT! Can be solved by filtering and mapping free tekst tags to existing vocabularies.

• The WNW tags were the most useful and specifc; content influences specificity.

• Tags can be used in different ways and the relevance varies per user group.

• Documentalists exicted about further development!

Evaluation

Evaluation

Source: Jakob Nielsen’s Alertblog 9 October 2006

‘Fun’+

Competition+

Altruism+

Content+

Reward+…=

Motivation

Waisda? Woordentikkertje

Months

Videos

Players

Tags – totalTags – unique

Matches• Players• Geo. names*• Persons*

8

648

2,435

428,83248,242 (11%)

• 156,546 (37%)• 6,089 (1,4%)• 107 (0,25%)

4,5

2,892

689

392,86043,407 (11%)

• 215,156 (55%)• 23,142 (5,8%)• 2,423 (0,6%)

* For Waisda? we looked at unique tags, for Woordentikkertje at the total number of tags

Tips and lessons learned so far

• What are your success criteria?• How do you define your target users,

and how do you reach them?• How do you motivate your target

users?

• Read existing reports and literature!• Keep learning and improving!

And beyond…

• Open Source version of Waisda?• Crowdsourcing Olympics• More research into the added value of

tags for retrieval (subtitle comparison, tests with various end users, more research on linking semantically rich sources to tags)

Future work

...recommended sourcesblogs, feeds, people

• http://museumtwo.blogspot.com/• http://80gb.wordpress.com/• http://themuseumofthefuture.com/• http://www.delicious.com/RuncocoProject/• @ammeveleigh• @archivesopen• @digitalst• @microtask• @mia_out • @museweb• @runcoco• @wittylama

This presentation is partly based on Oomen & Aroyo 2011: http://www.slideshare.net/PaulaUdondek/crowdsourcing-in-het-cultureel-erfgoed-kansen-uitdagingen

Thanks!

@lottebelice / [email protected]

Big thank you to:B&G: @johanoomen / @mbrinkerink VU: @laroyo / @McHildebrand

http://blog.waisda.nlhttp://woordentikkertje.manbijthond.nl

Technology

Crowdsourcing metadata for audiovisual collections