Upload
antonina-scuderi
View
223
Download
6
Embed Size (px)
DESCRIPTION
Pictorial View of CUbRIK research from Posters
Citation preview
How Do We Deep-Link? Leveraging User-Contributed
Time-Links for Non-Linear Video Access
Raynor Vliegendhart, Babak Loni, Martha Larson, and Alan Hanjalic
21st ACM international conference on Multimedia, Barcelona, Spain, 2013
Future Work
● Improve automatic classification by adding content features
● Develop the envisioned deep-link retrieval system
Contact: [email protected]
Results
● Annotation agreement:
● VER/non-VER: 2,842 comments (84.6%)
● VERV: 2,140 comments (63.7%)
● Automatic classification results:
● Misclassification challenges:
● “Funny” comments often labeled as here by humans,
but classified as love by the classifier
● Comments with multiple interpretations
● Comments with multiple sentences
Multimedia Information Retrieval Lab, Delft University of Technology
@ShinNoNoir
Introduction
● Problem: How do users deep-link? (i.e., refer to points in a video by explicitly mentioning time-codes)
● Motivation: Leverage time-codes within deep-link comments
for enabling non-linear video access
● Dataset: MSRA-MM 2.0 / YouTube comments
Contributions
● Notion of Viewer Expressive Reaction (VER):
Reflects viewers’ perceptions of noteworthiness
(but extend beyond depicted content and induced affect)
● Viewer Expressive Reaction Variety taxonomy (VERV):
Captures how users deep-link;
Shown to be appropriate for automatic filtering
Approach
● Taxonomy elicitation via crowdsourcing (Amazon Mechanical Turk):
● Given: 3 deep-link comments per video
● Task: Describe why a comment was posted (2–4 sentences)
● Post-processing: Card-sorting technique
● Annotation crowdsourcing task, for:
● Validating elicited VERV taxonomy
● Annotating 3,359 deep-link comments:
● Whether it contains a true deep-link (VER/non-VER)
● VERV class (if and only if VER comment)
● Linear SVM comment classification experiment (unigram features)
(These deep-link comments occur unprompted on social video sharing platforms)
0:44 omg so cute
i liked it till 2:11 then it became boring
what’s the breed of the cat at 2:14?
5:17 Damn!
That move at 6:12 was dumb
2:59 That didn’t go too well
The song at 3:33 is called “The eye of the tiger”
Deep-links ▼ cats at play +surprise
Stalking catby lowdope • 4 years ago • 4,435 views
Moire blog: http://moire.lowdope.com/
1:14
Wrestling kittensby Rizzzalie • 5 years ago • 8,136 views
World wrestling federation, young feline division
0:39
Funny cats in waterby NekoTV • 2 year ago • 3,491 views
Funny cats in and around water
3:24►
►
►
1:12 omg, that’s impossible!Nyaaa wrote:
O_o now I didn’t expect that at all! 0:33Xfade wrote:
whoa. unreal creepy eyes at 1:02jb87 wrote:
Envisioned Future Retrieval System
epic failure at 0:23
CONTEXT-BASED PEOPLE RECOGNITION
in CONSUMER PHOTO COLLECTIONS
Markus Brenner, Ebroul Izquierdo MMV Research Group, School of Electronic Engineering and Computer Science
Queen Mary University of London, UK
{markus.brenner, ebroul.izquierdo}@eecs.qmul.ac.uk
Face Detection and Basic Recognition
Initial steps: Image preprocessing, face detection and face normalization
Descriptor-based: Local Binary Pattern (LBP) texture histograms
Similarity metric: Chi-Square Statistics
Basic face recognition: k-Nearest-Neighbor
Graph-based Recognition
Model: pairwise Markov Network (graph nodes represent faces)
Unary Potentials: likelihood of faces belonging to
particular people
Pairwise Potentials: encourage spatial smoothness,
encode exclusivity constraint and temporal domain
Topology: only the most similar faces are
connected with edges
Inference: maximum a posteriori (MAP)
solution of Loopy Belief Propagation (LBP)
Social Semantics
Individual appearance for a more effective graph
topology (used to regularize the number of edges)
Unique People Constraint models exclusivity:
a person cannot appear more than once in a photo
Pairwise co-appearance: people appearing together
bear a higher likelihood of appearing together again
Groups of people: use data mining to
discover frequently appearing social patterns
Body Detection and Recognition
… when faces are obscured or invisible
Detect upper and lower body parts
Bipartite matching of faces and bodies
Graph-based fusion of faces and clothing
f2f1
f3
Unary potential
Pairwise potential
Face
Resolve identities of people primarily by their faces
Incorporate rich contextual cues of personal photo collections
where few individual people frequently appear together
Perform recognition by considering all contextual information
at the same time (unlike traditional approaches that usually
train a classifier and then predict identities independently)
Aim
𝑢 𝑤𝑛 =1
𝑍𝑓𝑓 𝑤𝑛
Experiments Public Gallagher Dataset:
~600 photos, ~800 faces, 32 distinct people
Our dataset:
~3300 photos, ~5000 faces, 106 distinct people
All photos shot with a typical consumer camera
Considering only correctly detected faces (87%)
Te Tr
Tr
Tr
Te
Face
similarity
All samples
are independent
Te
TrTr
TrTe
Based on face
similarities
Unary potential
of every node
Te
TrTr
TrTe
Upper body
similarity
Face
similarity
Lower
body
similarity
Unary potential
of every node
...
𝑝 𝑤𝑛 ,𝑤𝑚 =
𝜏, 𝑖𝑓 𝑤𝑛 = 𝑤𝑚 ∧ 𝑖𝑛 ≠ 𝑖𝑚 0, 𝑖𝑓 𝑤𝑛 = 𝑤𝑚 ∧ 𝑖𝑛 = 𝑖𝑚
𝑐𝑜 𝑤𝑛 ,𝑤𝑚 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
0%
5%
10%
15%
20%
25%
+ Graph. Model + Social Semantics + Body parts
Gain @ 3% training
… for each block …
LBP
LBP
PicAlert!: A System for Privacy-Aware Image Classificationand Retrieval
Sergej Zerr, Stefan Siersdorfer, Jonathon HareE-mail: {zerr,siersdorfer}@L3S.de, [email protected]
A large portion of images, published in
social Web2.0 applications, are of a highlysensitive nature, disclosing many details
of the users’ private life. We have
developed a web service which can detectprivate images within a user’s photo
stream and provide support in making
privacy decisions in the sharing context. In
addition, we present a privacy orientedimage search application which
automatically identifies potentially
sensitive images in the result set andseparates them from the remaining
pictures.
Acquiring the Ground Truth Using a Social Annotation Game1
81 users annotated
37,535 recent images27,405 public
4,701 private
Common notion of “privacy”:“Private are photos which have to do withthe private sphere (like self portraits,family, friends, your home) or containobjects that you would not share with theentire world (like a private email). The restare public. In case no decision can bemade, the picture should be marked asundecidable.”
Feature Extraction2
Search & Web Service4
Top-50 stemmed terms according to their
Mutual Information values for
“public” vs. “private” photos in Flickr
Web service GUI for privacy-oriented image classification.Search results for the query “cristiano ronaldo” (06/06/12).
GUI of the game.
Classifier Training3
Evaluation: P/R curves for the features and
their combination.
BEP
Visual: 0.74Text: 0.78
Combination: 0.80
Top-50 stemmed terms according to their
Mutual Information values for
“public” vs. “private” photos in Flickr
Colors SIFT
Edges Faces
www.cubrikproject.eu
Identify pairs with ADS = threshold±εSample and add to
Pcand
Get Crowd Labels for Pcand
High confidence pairs => Ptrain
Pcand= Pcand - Ptrain
Identify duplicate pairs from Ptrain, Pdupl
Compute crowd decisions and worker
confidences
Optimize DSParams and threshold to fit to the
data in Ptrain
Initial DSParams, ThresholdPcand = φ
Better DSParams, Threshold
Pdupl
Map to Humans and Reduce Error - Crowdsourcing for Deduplication Applied to Digital Libraries
Compare CD to AD and optimize DSParams and threshold to maximize Accuracy
Compare ADS to CSD and optimize DSParams•minimize the sum of errors •minimize the sum of log of errors
•maximize the Pearson correlationCompare CD to AD and optimize threshold to maximize Accuracy
• Find duplicate entities based on metadata• Focus on scientific publications in the Freesearch system
• An automatic method and human labelers work together towards improving their performance at identifying duplicate entities• Actively learn how to deduplicate from the crowd by optimizing the parameters of the automatic method
• MTurk HITs to get labeled data, while tackling the quality issues of the crowdsourced work
• DuplicatesScorer produces an ADS• DSParams={(fieldName, fieldWeight)} and threshold• Compare ADS to threshold => ADϵ{1,0}
Mihai Georgescu, Dang Duc Pham, Claudiu S. Firan, Julien Gaugaz, Wolfgang Nejdl
Optimization strategies
Crowd Decision Strategies
3 workers 5 workers
MV MV Iter Manual Boost Heur
Accuracy 79.19 80.00 79.73 80.00 78.92 79.73
Sum-Err 76.49 79.46 79.46 79.46 79.46 79.19
Sum-log-err 71.89 78.11 78.38 78.92 80.27 76.76
Pearson 73.24 79.46 79.46 80.54 79.46 81.08
• Aggregated decision from all workers for a pair produces a CSD • Worker contribution to the CSD is proportional to the confidence ck we have in him• Compare CDS to 0.5 => CDϵ{1,0}
Crowd Soft DecisionAggregation of all individual votes Wi,j(k)ϵ{-1,1}CSD ϵ{0,1}
Worker Confidence• Asses how reliable are the individual workers when compared to the overall performance of the crowd• Simple measure: proportion of pairs that have the same label as the one assigned by the crowd• Use an EM algorithm to iteratively compute the worker confidence
• Compute CSD• Update ck
Crowd Decision Strategies:• MV: Majority Voting; All users are equal ck=1• Iter: ck computed using the EM algorithm• Boost: ck computed using the EM algorithm using boosted weights in the computation of CSD• Heur: Heuristic 3/3 or 4/5
2
)()(1,
,,
,
jiWk
jiji
ji
kWkweight
CSD
jWiv v
kji
c
ckweight
,
, )(
Contact: Mihai Georgescu
email: [email protected]
L3S Research Center / Leibniz Universität HannoverAppelstrasse 4, 30167 Hannover, Germany
phone: +49 511 762-19715
R
A
P-
0.20
0.40
0.60
0.80
1.00
s igns ign+DS/m
s ign+DS/ oDS/m
DS/ oCD-MV
sign sign+DS/m sign+DS/o DS/m DS/o CD-MV
R 0.20 0.20 0.20 0.67 0.56 0.97
A 0.77 0.77 0.77 0.70 0.79 0.83
P 0.95 0.95 1.00 0.48 0.66 0.63
Duplicate Detection Strategies
Crowd Decision
Automatic Method
Experiment Setup• 3 Batches :
o 60 HITs with qualification testo 60 HITs without qualification testo 120 HITs without qualification test•Just signatures
• Sign•Just the DuplicatesScorer
• DS/m• DS/o
•First compute signatures and then base
decision on DuplicatesScorer• sign + DS/m• sign + DS/o
•Directly use Crowd Decision obtained via Majority Voting CD-MV
Crowdsourcing:
1 HIT = 5 Pairs5ct / HIT3 ->5 Assignments
Crowd Decision and Optimization Strategies
[Show Diff] [Full Text]Title: Comparing H euristic, Evolutionary and Local Search Approaches to Scheduling
Authors : Soraya Rana, Adele E. H owe, L. Darrell, Whitley Keith Mathias
Venue: Proceedings of the Third International Conference on Artificial Intelligence Planning Systems, Menlo Park, CA Publisher: The AAAI PressYear: 1996Language: English
Type: conference
Abstract: The choice of search algorithm can play a vital role in the success of a scheduling application. In this paper, we investigate the contribution of search algorithms in solving a real-world warehouse scheduling problem. We compare performance of three types of
scheduling algorithms: heuris tic, genetic algorithms and local search.
[Show Diff]Title: Comparing H euristic, Evolutionary and Local Search Approaches to Scheduling.
Authors : Soraya B. Rana, Adele E. H owe, L. Darrell Whitley, Keith E. MathiasBook: AIPS Pg. 174-181 [Contents ]Year: 1996 Language: English Type: conference (inproceedings )
After carefully reviewing the publications metadata presented to you, how would you class ify the 2 publications referred:
Judgment for publications pair:oDuplicates
oNot Duplicates
www.cubrikproject.eu
dblp.kbs.uni-hannover.de
Contact: Ernesto Diaz-Aviles, Mihai Georgescu
email: {diaz, georgescu}@L3S.de
Swarming to Rank for Recommender SystemsErnesto Diaz-Aviles, Mihai Georgescu, and Wolfgang Nejdl
• Address the item recommendation task in the context of recommender systems
• An approach to learning ranking functions exploiting collaborative latent factors as features
• Instead of manually creating an item feature vector, factorize a matrix of user-item interactions
•Use these collaborative latent factors as input to
the Swarm Intelligence(SI) ranking method SwarmRank
Overview
Evaluation
SI for Recommender Systems
Dataset: Real world data from internet radio:
5-core of the Last.fm Dataset –1K Users
transactions 242,103
Unique users 888
Items(artists) 35,315
Swarm-RankCF• a collaborative learning to rank algorithm based on SI• while learning to rank algorithms use hand-picked feature to represent items we learn such features based on user-item interactions, and apply a PSO-based optimization algorithm that directly maximizes Mean Average Precision.
Evaluation Methodology: All-but-one
protocol or leave-one-out holdout method
where hit(u) = 1, if the hidden item I is present in u’sTop-N list of recommendations, and 0 otherwise.
L3S Research Center / Leibniz Universität Hannover
Appelstrasse 4, 30167 Hannover, Germany
phone: +49 511 762-19715
www.cubrikproject.eu
LikeLines: Collecting Timecode-level Feedback
for Web Videos through User Interactions
Raynor Vliegendhart, Martha Larson, and Alan Hanjalic
20th ACM international conference on Multimedia, Nara, Japan, 2012
Future Work
● For what kinds of video is timecode-level feedback useful?
● How should user interactions be interpreted?
● How to fuse timecode-level feedback with content analysis
without encouraging snowball effects?
● Can timecode-level data be linked to queries to recommend
relevant jump points?
● How to collect a critical mass of timecode-level data by
incentivizing users to interact with the system?
Contact: [email protected]
Implementation
● Video player component implemented in JavaScript and HTML5.
● Out-of-the-box support for YouTube and HTML5 videos.
● Video player component communicates with a back-end server
using JSON(P).
● Back-end server reference implementation is written in Python.
Multimedia Information Retrieval Lab, Delft University of Technology
@ShinNoNoir
Problem
● Problem: Providing users with a navigable heat map of interesting
regions of the video they are watching.
● Motivation: Conventional time sliders do not make the inner structure
of the video apparent, making it hard to navigate to the interesting bits.
Approach
A Web video player component with a navigable heat map, that:
● Uses multimedia content analysis to seed the heat map.
● Captures implicit and explicit user feedback at the timecode-level
to refine the heat map.
System Overview
● Video player component, augmented with:
● Navigable heat map that allows users to jump directly to “hot”
areas;
● Time-sensitive “like” button that allows users to explicitly like
particular points in the video.
● Captures user interactions:
● Implicit feedback such as playing, pausing and seeking;
● Explicit “likes” expressed by the user.
● Combines content analysis and captured user interactions to compute
a video’s heat map.
● Back-end interaction session server stores and aggregates per video:
● All interaction sessions between each user and player;
● Initial multimedia content analysis of the video.
Source code:
https://github.com/delftmir/likelines-player
viewers
content analysis
LikeLines player
play
pause
seek
like
... interaction
session server
content analysis
t (s)
user feedback 1
t (s)
user feedback n
t (s) + + + …
<script type="text/javascript">
var player = new LikeLines.Player('playerDiv', {
video: 'http://www.youtube.com/watch?v=wPTilA0XxYE',
backend: 'http://backend:9090/'
});
</script>
One of These Things is Not Like the Other:
Crowdsourcing Semantic Similarity of Multimedia Files
Raynor Vliegendhart*, Martha Larson*, and Johan Pouwelse**
ICT.OPEN 2012, Rotterdam, The Netherlands, 2012
Problem
● Problem: What constitutes a near duplicate?
For example: Are these two files the same? Why (not)?
Yes: It’s the same song.
No: These are different performances by different performers.
Definition:
Functional near-duplicate multimedia items are items that fulfill the
same purpose for the user. Once the user has one of these items,
there is no additional need for another.
● Task: Discovering new notions of user-perceived similarity between
multimedia files in a file-sharing setting.
● Motivation: Clustering items in search results.
Approach
● Idea: Point the odd one out, inspired by Sesame Street’s
“one of these things is not like the other”.
● Crowdsourcing Task:
● 3 multimedia files displayed as search results
● Worker points the odd one out and justifies why.
● Challenge: Eliciting serious judgments
Contact: [email protected]
Multimedia Information Retrieval Lab*
Delft University of Technology
Parallel and Distributed Systems Group**
Delft University of Technology
@ShinNoNoir
Chrono Cross - 'Dream of the
Shore Near Another World'
Violin/Piano Cover
(YouTube: IQYNEj51EUI)
Chrono Cross Dream of the
Shore Near Another World
Violin and Piano
(YouTube: Iuh3YrJtK3M)
Screenshots from Tribler 5.4 (tribler.org)
HIT Design
Amazon Mechanical Turk (AMT) is a crowdsourcing platform
to which Human Intelligence Tasks (HITs) can be submitted.
Phrasing in our HIT is important in order to elicit serious judgments:
● “Imagine that you download the three items in the list and that
you view them.”
● Don’t force workers to make a contrast, and
● Explain the definition of functional similarity.
Harry Potter and the Sorcerers Stone Audio
Book (478 MB)
Harry Potter and the Sorcerer s Stone
(2001)(ENG GER NL) 2Lions- (4.36 GB)
Harry Potter.And.The.Sorcerer.Stone.DVDR.
NTSC.SKJACK.Universal.S (4.46 GB)
o The items are comparable. They are for all practical purposes the
same. Someone would never really need all three of these.
o Each item can be considered unique. I can imagine that someone
might really want to download all three of these items.
o One item is not like the other two. (Please mark that item in the list.)
The other two items are comparable.
Experiments
● Dataset:
● Popular file-sharing site: The Pirate Bay (thepiratebay.se).
● 75 queries derived from Top 100 list.
● 32,773 filenames and metadata.
● 1000 random triads sampled from search results.
● Crowdsourcing Experiment:
● Recruitment HIT and Main HIT run concurrently on AMT.
● 8 out of 14 qualified workers produced free-text judgments
for 308 triads within 36 hours.
● Card Sort:
● Group similar judgments into piles,
merge piles iteratively, and, finally
label each pile.
● End result: 44 user-perceived
dimensions of similarity discovered.
Conclusion
● Wealth of user-perceived dimensions of similarity discovered.
● Quick results due to interesting crowdsourcing task.
0.50$
0.55$
0.60$
0.65$
0.70$
0.75$
0.80$
AMT$workers$vs.$Moviegoers$ YouTube$comments$vs.$Moviegoers$
Cosin
e$Similarity$
adjectives
nouns
Mining Emotions in Short FilmsUser Comments or Crowdsourcing?
Extract emotions in short filmsExploit film criticism expressed through YouTube comments
Task
Create a profile for each short filmExtract the terms from the profileAssociate to each term an emotion and polarityCompute the emotion vector and polarity
Emotion detection approach [2]1.2.3.4.
Emotion lexicon
MotivationEmotions are everywhereMany applications and diverse disciplines can benefit from mining emotions
Human-provided word-emotionassociation ratings annotatedaccording to Plutchik’s psychoevolutionarytheory (NRC Emotion Lexicon - EmoLex)[1]
TROPFEST YOUR FILMFESTIVAL
c1
short filmcomments
EmoLexshort filmprofile
emotion and polarity vector
Amazon Mechanical Turk
Sandbox
Amazon Mechanical Turk
emotion and polarity vector
emotion and polarity vector
Cosine similarity between the emotional vectors built from expert judgments and the ones built (i) through crowdsourcing using AMT, and (ii) automatically using YouTube comments.
c2
cn
c1c2
cn
.
.
.
Claudia Orellana-Rodriguez
Ernesto Diaz-Aviles
Wolfgang Nejdl
Plutchik’s Wheel of Emotions
Claudia Orellana-Rodriguez
L3S Research Center
e-mail: [email protected]
[1] S. M. Mohammad and P. D. Turney, “Crowdsourcing a word- emotion association lexicon,” Computational Intelligence, 2011. [2] E. Diaz-Aviles, C. Orellana-Rodriguez, and W. Nejdl. Taking the Pulse of Political Emotions in Latin America Based on Social Web Streams. In LA-WEB, 2012