CUbRIK Posters

How Do We Deep-Link? Leveraging User-Contributed

Time-Links for Non-Linear Video Access

Raynor Vliegendhart, Babak Loni, Martha Larson, and Alan Hanjalic

21st ACM international conference on Multimedia, Barcelona, Spain, 2013

Future Work

● Improve automatic classification by adding content features

● Develop the envisioned deep-link retrieval system

Contact: [email protected]

Results

● Annotation agreement:

● VER/non-VER: 2,842 comments (84.6%)

● VERV: 2,140 comments (63.7%)

● Automatic classification results:

● Misclassification challenges:

● “Funny” comments often labeled as here by humans,

but classified as love by the classifier

● Comments with multiple interpretations

● Comments with multiple sentences

Multimedia Information Retrieval Lab, Delft University of Technology

@ShinNoNoir

Introduction

● Problem: How do users deep-link? (i.e., refer to points in a video by explicitly mentioning time-codes)

● Motivation: Leverage time-codes within deep-link comments

for enabling non-linear video access

● Dataset: MSRA-MM 2.0 / YouTube comments

Contributions

● Notion of Viewer Expressive Reaction (VER):

Reflects viewers’ perceptions of noteworthiness

(but extend beyond depicted content and induced affect)

● Viewer Expressive Reaction Variety taxonomy (VERV):

Captures how users deep-link;

Shown to be appropriate for automatic filtering

Approach

● Taxonomy elicitation via crowdsourcing (Amazon Mechanical Turk):

● Given: 3 deep-link comments per video

● Task: Describe why a comment was posted (2–4 sentences)

● Post-processing: Card-sorting technique

● Annotation crowdsourcing task, for:

● Validating elicited VERV taxonomy

● Annotating 3,359 deep-link comments:

● Whether it contains a true deep-link (VER/non-VER)

● VERV class (if and only if VER comment)

● Linear SVM comment classification experiment (unigram features)

(These deep-link comments occur unprompted on social video sharing platforms)

0:44 omg so cute

i liked it till 2:11 then it became boring

what’s the breed of the cat at 2:14?

5:17 Damn!

That move at 6:12 was dumb

2:59 That didn’t go too well

The song at 3:33 is called “The eye of the tiger”

Deep-links ▼ cats at play +surprise

Stalking catby lowdope • 4 years ago • 4,435 views

Moire blog: http://moire.lowdope.com/

1:14

Wrestling kittensby Rizzzalie • 5 years ago • 8,136 views

World wrestling federation, young feline division

0:39

Funny cats in waterby NekoTV • 2 year ago • 3,491 views

Funny cats in and around water

3:24►

►

►

1:12 omg, that’s impossible!Nyaaa wrote:

O_o now I didn’t expect that at all! 0:33Xfade wrote:

whoa. unreal creepy eyes at 1:02jb87 wrote:

Envisioned Future Retrieval System

epic failure at 0:23

CONTEXT-BASED PEOPLE RECOGNITION

in CONSUMER PHOTO COLLECTIONS

Markus Brenner, Ebroul Izquierdo MMV Research Group, School of Electronic Engineering and Computer Science

Queen Mary University of London, UK

{markus.brenner, ebroul.izquierdo}@eecs.qmul.ac.uk

Face Detection and Basic Recognition

Initial steps: Image preprocessing, face detection and face normalization

Descriptor-based: Local Binary Pattern (LBP) texture histograms

Similarity metric: Chi-Square Statistics

Basic face recognition: k-Nearest-Neighbor

Graph-based Recognition

Model: pairwise Markov Network (graph nodes represent faces)

Unary Potentials: likelihood of faces belonging to

particular people

Pairwise Potentials: encourage spatial smoothness,

encode exclusivity constraint and temporal domain

Topology: only the most similar faces are

connected with edges

Inference: maximum a posteriori (MAP)

solution of Loopy Belief Propagation (LBP)

Social Semantics

Individual appearance for a more effective graph

topology (used to regularize the number of edges)

Unique People Constraint models exclusivity:

a person cannot appear more than once in a photo

Pairwise co-appearance: people appearing together

bear a higher likelihood of appearing together again

Groups of people: use data mining to

discover frequently appearing social patterns

Body Detection and Recognition

… when faces are obscured or invisible

Detect upper and lower body parts

Bipartite matching of faces and bodies

Graph-based fusion of faces and clothing

f2f1

f3

Unary potential

Pairwise potential

Face

Resolve identities of people primarily by their faces

Incorporate rich contextual cues of personal photo collections

where few individual people frequently appear together

Perform recognition by considering all contextual information

at the same time (unlike traditional approaches that usually

train a classifier and then predict identities independently)

Aim

𝑢 𝑤𝑛 =1

𝑍𝑓𝑓 𝑤𝑛

Experiments Public Gallagher Dataset:

~600 photos, ~800 faces, 32 distinct people

Our dataset:

~3300 photos, ~5000 faces, 106 distinct people

All photos shot with a typical consumer camera

Considering only correctly detected faces (87%)

Te Tr

Tr

Tr

Te

Face

similarity

All samples

are independent

Te

TrTr

TrTe

Based on face

similarities

Unary potential

of every node

Te

TrTr

TrTe

Upper body

similarity

Face

similarity

Lower

body

similarity

Unary potential

of every node

...

𝑝 𝑤𝑛 ,𝑤𝑚 =

𝜏, 𝑖𝑓 𝑤𝑛 = 𝑤𝑚 ∧ 𝑖𝑛 ≠ 𝑖𝑚 0, 𝑖𝑓 𝑤𝑛 = 𝑤𝑚 ∧ 𝑖𝑛 = 𝑖𝑚

𝑐𝑜 𝑤𝑛 ,𝑤𝑚 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

0%

5%

10%

15%

20%

25%

+ Graph. Model + Social Semantics + Body parts

Gain @ 3% training

… for each block …

LBP

LBP

PicAlert!: A System for Privacy-Aware Image Classificationand Retrieval

Sergej Zerr, Stefan Siersdorfer, Jonathon HareE-mail: {zerr,siersdorfer}@L3S.de, [email protected]

A large portion of images, published in

social Web2.0 applications, are of a highlysensitive nature, disclosing many details

of the users’ private life. We have

developed a web service which can detectprivate images within a user’s photo

stream and provide support in making

privacy decisions in the sharing context. In

addition, we present a privacy orientedimage search application which

automatically identifies potentially

sensitive images in the result set andseparates them from the remaining

pictures.

Acquiring the Ground Truth Using a Social Annotation Game1

81 users annotated

37,535 recent images27,405 public

4,701 private

Common notion of “privacy”:“Private are photos which have to do withthe private sphere (like self portraits,family, friends, your home) or containobjects that you would not share with theentire world (like a private email). The restare public. In case no decision can bemade, the picture should be marked asundecidable.”

Feature Extraction2

Search & Web Service4

Top-50 stemmed terms according to their

Mutual Information values for

“public” vs. “private” photos in Flickr

Web service GUI for privacy-oriented image classification.Search results for the query “cristiano ronaldo” (06/06/12).

GUI of the game.

Classifier Training3

Evaluation: P/R curves for the features and

their combination.

BEP

Visual: 0.74Text: 0.78

Combination: 0.80

Top-50 stemmed terms according to their

Mutual Information values for

“public” vs. “private” photos in Flickr

Colors SIFT

Edges Faces

www.cubrikproject.eu

Identify pairs with ADS = threshold±εSample and add to

Pcand

Get Crowd Labels for Pcand

High confidence pairs => Ptrain

Pcand= Pcand - Ptrain

Identify duplicate pairs from Ptrain, Pdupl

Compute crowd decisions and worker

confidences

Optimize DSParams and threshold to fit to the

data in Ptrain

Initial DSParams, ThresholdPcand = φ

Better DSParams, Threshold

Pdupl

Map to Humans and Reduce Error - Crowdsourcing for Deduplication Applied to Digital Libraries

Compare CD to AD and optimize DSParams and threshold to maximize Accuracy

Compare ADS to CSD and optimize DSParams•minimize the sum of errors •minimize the sum of log of errors

•maximize the Pearson correlationCompare CD to AD and optimize threshold to maximize Accuracy

• Find duplicate entities based on metadata• Focus on scientific publications in the Freesearch system

• An automatic method and human labelers work together towards improving their performance at identifying duplicate entities• Actively learn how to deduplicate from the crowd by optimizing the parameters of the automatic method

• MTurk HITs to get labeled data, while tackling the quality issues of the crowdsourced work

• DuplicatesScorer produces an ADS• DSParams={(fieldName, fieldWeight)} and threshold• Compare ADS to threshold => ADϵ{1,0}

Mihai Georgescu, Dang Duc Pham, Claudiu S. Firan, Julien Gaugaz, Wolfgang Nejdl

Optimization strategies

Crowd Decision Strategies

3 workers 5 workers

MV MV Iter Manual Boost Heur

Accuracy 79.19 80.00 79.73 80.00 78.92 79.73

Sum-Err 76.49 79.46 79.46 79.46 79.46 79.19

Sum-log-err 71.89 78.11 78.38 78.92 80.27 76.76

Pearson 73.24 79.46 79.46 80.54 79.46 81.08

• Aggregated decision from all workers for a pair produces a CSD • Worker contribution to the CSD is proportional to the confidence ck we have in him• Compare CDS to 0.5 => CDϵ{1,0}

Crowd Soft DecisionAggregation of all individual votes Wi,j(k)ϵ{-1,1}CSD ϵ{0,1}

Worker Confidence• Asses how reliable are the individual workers when compared to the overall performance of the crowd• Simple measure: proportion of pairs that have the same label as the one assigned by the crowd• Use an EM algorithm to iteratively compute the worker confidence

• Compute CSD• Update ck

Crowd Decision Strategies:• MV: Majority Voting; All users are equal ck=1• Iter: ck computed using the EM algorithm• Boost: ck computed using the EM algorithm using boosted weights in the computation of CSD• Heur: Heuristic 3/3 or 4/5

2

)()(1,

,,

,

jiWk

jiji

ji

kWkweight

CSD

jWiv v

kji

c

ckweight

,

, )(

Contact: Mihai Georgescu

email: [email protected]

L3S Research Center / Leibniz Universität HannoverAppelstrasse 4, 30167 Hannover, Germany

phone: +49 511 762-19715

R

A

P-

0.20

0.40

0.60

0.80

1.00

s igns ign+DS/m

s ign+DS/ oDS/m

DS/ oCD-MV

sign sign+DS/m sign+DS/o DS/m DS/o CD-MV

R 0.20 0.20 0.20 0.67 0.56 0.97

A 0.77 0.77 0.77 0.70 0.79 0.83

P 0.95 0.95 1.00 0.48 0.66 0.63

Duplicate Detection Strategies

Crowd Decision

Automatic Method

Experiment Setup• 3 Batches :

o 60 HITs with qualification testo 60 HITs without qualification testo 120 HITs without qualification test•Just signatures

• Sign•Just the DuplicatesScorer

• DS/m• DS/o

•First compute signatures and then base

decision on DuplicatesScorer• sign + DS/m• sign + DS/o

•Directly use Crowd Decision obtained via Majority Voting CD-MV

Crowdsourcing:

1 HIT = 5 Pairs5ct / HIT3 ->5 Assignments

Crowd Decision and Optimization Strategies

[Show Diff] [Full Text]Title: Comparing H euristic, Evolutionary and Local Search Approaches to Scheduling

Authors : Soraya Rana, Adele E. H owe, L. Darrell, Whitley Keith Mathias

Venue: Proceedings of the Third International Conference on Artificial Intelligence Planning Systems, Menlo Park, CA Publisher: The AAAI PressYear: 1996Language: English

Type: conference

Abstract: The choice of search algorithm can play a vital role in the success of a scheduling application. In this paper, we investigate the contribution of search algorithms in solving a real-world warehouse scheduling problem. We compare performance of three types of

scheduling algorithms: heuris tic, genetic algorithms and local search.

[Show Diff]Title: Comparing H euristic, Evolutionary and Local Search Approaches to Scheduling.

Authors : Soraya B. Rana, Adele E. H owe, L. Darrell Whitley, Keith E. MathiasBook: AIPS Pg. 174-181 [Contents ]Year: 1996 Language: English Type: conference (inproceedings )

After carefully reviewing the publications metadata presented to you, how would you class ify the 2 publications referred:

Judgment for publications pair:oDuplicates

oNot Duplicates


dblp.kbs.uni-hannover.de

Contact: Ernesto Diaz-Aviles, Mihai Georgescu

email: {diaz, georgescu}@L3S.de

Swarming to Rank for Recommender SystemsErnesto Diaz-Aviles, Mihai Georgescu, and Wolfgang Nejdl

• Address the item recommendation task in the context of recommender systems

• An approach to learning ranking functions exploiting collaborative latent factors as features

• Instead of manually creating an item feature vector, factorize a matrix of user-item interactions

•Use these collaborative latent factors as input to

the Swarm Intelligence(SI) ranking method SwarmRank

Overview

Evaluation

SI for Recommender Systems

Dataset: Real world data from internet radio:

5-core of the Last.fm Dataset –1K Users

transactions 242,103

Unique users 888

Items(artists) 35,315

Swarm-RankCF• a collaborative learning to rank algorithm based on SI• while learning to rank algorithms use hand-picked feature to represent items we learn such features based on user-item interactions, and apply a PSO-based optimization algorithm that directly maximizes Mean Average Precision.

Evaluation Methodology: All-but-one

protocol or leave-one-out holdout method

where hit(u) = 1, if the hidden item I is present in u’sTop-N list of recommendations, and 0 otherwise.

L3S Research Center / Leibniz Universität Hannover

Appelstrasse 4, 30167 Hannover, Germany

phone: +49 511 762-19715


LikeLines: Collecting Timecode-level Feedback

for Web Videos through User Interactions

Raynor Vliegendhart, Martha Larson, and Alan Hanjalic

20th ACM international conference on Multimedia, Nara, Japan, 2012

Future Work

● For what kinds of video is timecode-level feedback useful?

● How should user interactions be interpreted?

● How to fuse timecode-level feedback with content analysis

without encouraging snowball effects?

● Can timecode-level data be linked to queries to recommend

relevant jump points?

● How to collect a critical mass of timecode-level data by

incentivizing users to interact with the system?


Implementation

● Video player component implemented in JavaScript and HTML5.

● Out-of-the-box support for YouTube and HTML5 videos.

● Video player component communicates with a back-end server

using JSON(P).

● Back-end server reference implementation is written in Python.

Multimedia Information Retrieval Lab, Delft University of Technology

@ShinNoNoir

Problem

● Problem: Providing users with a navigable heat map of interesting

regions of the video they are watching.

● Motivation: Conventional time sliders do not make the inner structure

of the video apparent, making it hard to navigate to the interesting bits.

Approach

A Web video player component with a navigable heat map, that:

● Uses multimedia content analysis to seed the heat map.

● Captures implicit and explicit user feedback at the timecode-level

to refine the heat map.

System Overview

● Video player component, augmented with:

● Navigable heat map that allows users to jump directly to “hot”

areas;

● Time-sensitive “like” button that allows users to explicitly like

particular points in the video.

● Captures user interactions:

● Implicit feedback such as playing, pausing and seeking;

● Explicit “likes” expressed by the user.

● Combines content analysis and captured user interactions to compute

a video’s heat map.

● Back-end interaction session server stores and aggregates per video:

● All interaction sessions between each user and player;

● Initial multimedia content analysis of the video.

Source code:

https://github.com/delftmir/likelines-player

viewers

content analysis

LikeLines player

play

pause

seek

like

... interaction

session server

content analysis

t (s)

user feedback 1

t (s)

user feedback n

t (s) + + + …

<script type="text/javascript">

var player = new LikeLines.Player('playerDiv', {

video: 'http://www.youtube.com/watch?v=wPTilA0XxYE',

backend: 'http://backend:9090/'

});

</script>

One of These Things is Not Like the Other:

Crowdsourcing Semantic Similarity of Multimedia Files

Raynor Vliegendhart*, Martha Larson*, and Johan Pouwelse**

ICT.OPEN 2012, Rotterdam, The Netherlands, 2012

Problem

● Problem: What constitutes a near duplicate?

For example: Are these two files the same? Why (not)?

Yes: It’s the same song.

No: These are different performances by different performers.

Definition:

Functional near-duplicate multimedia items are items that fulfill the

same purpose for the user. Once the user has one of these items,

there is no additional need for another.

● Task: Discovering new notions of user-perceived similarity between

multimedia files in a file-sharing setting.

● Motivation: Clustering items in search results.

Approach

● Idea: Point the odd one out, inspired by Sesame Street’s

“one of these things is not like the other”.

● Crowdsourcing Task:

● 3 multimedia files displayed as search results

● Worker points the odd one out and justifies why.

● Challenge: Eliciting serious judgments


Multimedia Information Retrieval Lab*

Delft University of Technology

Parallel and Distributed Systems Group**

Delft University of Technology

@ShinNoNoir

Chrono Cross - 'Dream of the

Shore Near Another World'

Violin/Piano Cover

(YouTube: IQYNEj51EUI)

Chrono Cross Dream of the

Shore Near Another World

Violin and Piano

(YouTube: Iuh3YrJtK3M)

Screenshots from Tribler 5.4 (tribler.org)

HIT Design

Amazon Mechanical Turk (AMT) is a crowdsourcing platform

to which Human Intelligence Tasks (HITs) can be submitted.

Phrasing in our HIT is important in order to elicit serious judgments:

● “Imagine that you download the three items in the list and that

you view them.”

● Don’t force workers to make a contrast, and

● Explain the definition of functional similarity.

Harry Potter and the Sorcerers Stone Audio

Book (478 MB)

Harry Potter and the Sorcerer s Stone

(2001)(ENG GER NL) 2Lions- (4.36 GB)

Harry Potter.And.The.Sorcerer.Stone.DVDR.

NTSC.SKJACK.Universal.S (4.46 GB)

o The items are comparable. They are for all practical purposes the

same. Someone would never really need all three of these.

o Each item can be considered unique. I can imagine that someone

might really want to download all three of these items.

o One item is not like the other two. (Please mark that item in the list.)

The other two items are comparable.

Experiments

● Dataset:

● Popular file-sharing site: The Pirate Bay (thepiratebay.se).

● 75 queries derived from Top 100 list.

● 32,773 filenames and metadata.

● 1000 random triads sampled from search results.

● Crowdsourcing Experiment:

● Recruitment HIT and Main HIT run concurrently on AMT.

● 8 out of 14 qualified workers produced free-text judgments

for 308 triads within 36 hours.

● Card Sort:

● Group similar judgments into piles,

merge piles iteratively, and, finally

label each pile.

● End result: 44 user-perceived

dimensions of similarity discovered.

Conclusion

● Wealth of user-perceived dimensions of similarity discovered.

● Quick results due to interesting crowdsourcing task.

0.50$

0.55$

0.60$

0.65$

0.70$

0.75$

0.80$

AMT$workers$vs.$Moviegoers$ YouTube$comments$vs.$Moviegoers$

Cosin

e$Similarity$

adjectives

nouns

Mining Emotions in Short FilmsUser Comments or Crowdsourcing?

Extract emotions in short filmsExploit film criticism expressed through YouTube comments

Task

Create a profile for each short filmExtract the terms from the profileAssociate to each term an emotion and polarityCompute the emotion vector and polarity

Emotion detection approach [2]1.2.3.4.

Emotion lexicon

MotivationEmotions are everywhereMany applications and diverse disciplines can benefit from mining emotions

Human-provided word-emotionassociation ratings annotatedaccording to Plutchik’s psychoevolutionarytheory (NRC Emotion Lexicon - EmoLex)[1]

TROPFEST YOUR FILMFESTIVAL

c1

short filmcomments

EmoLexshort filmprofile

emotion and polarity vector

Amazon Mechanical Turk

Sandbox

Amazon Mechanical Turk



Cosine similarity between the emotional vectors built from expert judgments and the ones built (i) through crowdsourcing using AMT, and (ii) automatically using YouTube comments.

c2

cn

c1c2

cn

.

.

.

Claudia Orellana-Rodriguez

[email protected]

Ernesto Diaz-Aviles

[email protected]

Wolfgang Nejdl

[email protected]

Plutchik’s Wheel of Emotions

Claudia Orellana-Rodriguez

L3S Research Center

e-mail: [email protected]

[1] S. M. Mohammad and P. D. Turney, “Crowdsourcing a word- emotion association lexicon,” Computational Intelligence, 2011. [2] E. Diaz-Aviles, C. Orellana-Rodriguez, and W. Nejdl. Taking the Pulse of Political Emotions in Latin America Based on Social Web Streams. In LA-WEB, 2012

Documents

CUbRIK Posters