Upload
cubrik-project
View
84
Download
0
Embed Size (px)
DESCRIPTION
Poster version of earlier work, presented at ICT.OPEN 2012. Original paper: Discovering User Perceptions of Semantic Similarity in Near-duplicate Multimedia Files in Near-duplicate Multimedia Files. In Proc. of 1st International Workshop on Crowdsourcing Web Search, Lyon, France, April 17, 2012, CEUR-WS.org. Available online: http://msp.ewi.tudelft.nl/sites/default/files/crowdsearch2012-vliegendhart.pdf.
Citation preview
One of These Things is Not Like the Other:
Crowdsourcing Semantic Similarity of Multimedia Files
Raynor Vliegendhart*, Martha Larson*, and Johan Pouwelse**
ICT.OPEN 2012, Rotterdam, The Netherlands, 2012
Problem
● Problem: What constitutes a near duplicate?
For example: Are these two files the same? Why (not)?
Yes: It’s the same song.
No: These are different performances by different performers.
Definition:
Functional near-duplicate multimedia items are items that fulfill the
same purpose for the user. Once the user has one of these items,
there is no additional need for another.
● Task: Discovering new notions of user-perceived similarity between
multimedia files in a file-sharing setting.
● Motivation: Clustering items in search results.
Approach
● Idea: Point the odd one out, inspired by Sesame Street’s
“one of these things is not like the other”.
● Crowdsourcing Task:
● 3 multimedia files displayed as search results
● Worker points the odd one out and justifies why.
● Challenge: Eliciting serious judgments
Contact: [email protected]
Multimedia Information Retrieval Lab*
Delft University of Technology
Parallel and Distributed Systems Group**
Delft University of Technology
@ShinNoNoir
Chrono Cross - 'Dream of the
Shore Near Another World'
Violin/Piano Cover
(YouTube: IQYNEj51EUI)
Chrono Cross Dream of the
Shore Near Another World
Violin and Piano
(YouTube: Iuh3YrJtK3M)
Screenshots from Tribler 5.4 (tribler.org)
HIT Design
Amazon Mechanical Turk (AMT) is a crowdsourcing platform
to which Human Intelligence Tasks (HITs) can be submitted.
Phrasing in our HIT is important in order to elicit serious judgments:
● “Imagine that you download the three items in the list and that
you view them.”
● Don’t force workers to make a contrast, and
● Explain the definition of functional similarity.
Harry Potter and the Sorcerers Stone Audio
Book (478 MB)
Harry Potter and the Sorcerer s Stone
(2001)(ENG GER NL) 2Lions- (4.36 GB)
Harry Potter.And.The.Sorcerer.Stone.DVDR.
NTSC.SKJACK.Universal.S (4.46 GB)
o The items are comparable. They are for all practical purposes the
same. Someone would never really need all three of these.
o Each item can be considered unique. I can imagine that someone
might really want to download all three of these items.
o One item is not like the other two. (Please mark that item in the list.)
The other two items are comparable.
Experiments
● Dataset:
● Popular file-sharing site: The Pirate Bay (thepiratebay.se).
● 75 queries derived from Top 100 list.
● 32,773 filenames and metadata.
● 1000 random triads sampled from search results.
● Crowdsourcing Experiment:
● Recruitment HIT and Main HIT run concurrently on AMT.
● 8 out of 14 qualified workers produced free-text judgments
for 308 triads within 36 hours.
● Card Sort:
● Group similar judgments into piles,
merge piles iteratively, and, finally
label each pile.
● End result: 44 user-perceived
dimensions of similarity discovered.
Conclusion
● Wealth of user-perceived dimensions of similarity discovered.
● Quick results due to interesting crowdsourcing task.