Upload
maranlar
View
89
Download
4
Embed Size (px)
Citation preview
Multimedia Information Retrieval:
Bytes and pixels meet the challenges of human media interpretation
Martha LarsonDelft University of Technology and Radboud University Nijmegen29 June 2016, Communication Science, Radboud University Nijmegen
About me
● Where do I work?○ TU Delft: Multimedia Computing Group○ Radboud University: Multimedia Information Technology
● What do I do?○ Background: Speech and language,○ Research: Multimedia retrieval and recommender systems,○ Emphasis: How people interpret and use multimedia.
● What am I doing today?○ Sharing with you potential and open issues.
Today’s topics
● Introducing intelligent information systems○ Multimedia information retrieval (user is active)○ Recommender systems (user is passive)
● Computer Science and Multimedia○ The “love” relationship: lots of data○ The “hate” relationship: people’s interpretation of media
is not “neat”!● How to move forward?
○ Benchmarking challenges
Intelligent Information Systems
● Connect users with information,● Information: digital content, facts, products, services,● Include search engines and recommender systems,● Success is judged by satisfaction of user needs.
Information retrieval
Definition: Information retrieval (IR) is finding material of an unstructured nature that satisfies an information need from within large collections. http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html
Recommender Systems
Definition: A recommender system tries to identify sets of items that are likely to be of interest to a certain user given some information from that user’s profile.
“Multimedia Clues” for the computer scientist● Text: Things people write about images and videos.● User interactions: What people click on, how long they
watch.● Pixel statistics: Colors, lines, textures, shot change
patterns.● Concept detection: Entities that can be detected in
images and videos (faces can be detected well).● Speech recognition: What is said in a video.● Sound detection: Sounds that can be detected (laughter
and gunshots can be detected well).
Visual Geo-location prediction
● Combine evidence from multiple images (e) taken in an area (Eg).
● Upweight elements that are distinctive for that particular area (WGeo).
Xinchao Li, Alan Hanjalic, Martha Larson. Geo-distinctive Visual Element Matching for Location Estimation of Images, Under review. http://arxiv.org/pdf/1601.07884v1.pdf
Good match: Lots of what’s unique
Visual Geo-location prediction
Xinchao Li, Alan Hanjalic, Martha Larson. Geo-distinctive Visual Element Matching for Location Estimation of Images, Under review. http://arxiv.org/pdf/1601.07884v1.pdf
Conventional search engine finds “what”
Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248.
I want a song called “koi pond”.I’m interested in garden koi ponds.
Intent-aware search responds to “why”
Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248.
I am interested in the significance of koi ponds.
I want to build a koi pond.
User intent in video search
Our study identified five major reasons why people search for videos online:
● Information (declarative knowledge)● Experience for Learning (performative knowledge)● Experience for Exposure (“being there”)● Affect (change of mood)● Object (video as video)
Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248.
Why are video moments important?
R. Vliegendhart, M. Larson, B. Loni and A. Hanjalic, "Exploiting the Deep-Link Commentsphere to Support Non-Linear Video Access," in IEEE Transactions on Multimedia, vol. 17, no. 8, pp. 1372-1384, Aug. 2015.
Viewer Expressive Reactions
R. Vliegendhart, M. Larson, B. Loni and A. Hanjalic, "Exploiting the Deep-Link Commentsphere to Support Non-Linear Video Access," in IEEE Transactions on Multimedia, vol. 17, no. 8, pp. 1372-1384, Aug. 2015.
Expressive reactions are not emotional in the classic sense.
They are also not completely personal...but..
The way people take a picture reflects what they are taking a picture of.
Pixel statistics reveal very simple information on how people take pictures.
We need people to judge if the computer guesses right.
Michael Riegler, Martha Larson, Mathias Lux, and Christoph Kofler. 2014. How 'How' Reflects What's What: Content-based Exploitation of How Users Frame Social Images. In Proceedings of the 22nd ACM international conference on Multimedia (MM '14).
Fashion and framing
Characterize the trend...
Jacket types are already very difficult for computers!
Crowdsourcing
People interpret images in exchange for micropayments.
Example: Amazon Mechanical Turk
MediaEval 2016Multimedia Benchmark Initiative
moving forward with benchmarking
MediaEval Multimedia Evaluation Benchmark
● offers tasks on multimedia access and retrieval,● exploits features derived from multiple modalities:
speech, audio, visual content, tags, users, context, ● solutions may or may not involve machine learning.
multimediaeval.org
This year: MediaEval workshop is right after ACM Multimedia 2016
in Amsterdam
Example MediaEval Tasks● Predicting Media Interestingness: Infer interesting
frames and segments of movies (using audio, visual features, text).
● Retrieving Diverse Social Images: Diversify image results lists (text, visual features).
● Context of Multimedia Experience: Predict multimedia content suitable for watching in stressful situations.
● Person Discovery: finding people in broadcast content.● Placing: geo-location estimation for social multimedia.
multimediaeval.org
Publications arising from MediaEvalhttp://www.citeulike.org/group/16499
2015 Workshop Participants80 participants from 25 countries
multimediaeval.org
MediaEval Proceedings Papers
multimediaeval.org
What sets MediaEval apart?
• … emphasizes the "multi" in multimedia: speech, audio, visual content, tags, users, context.
• … innovates new tasks and techniques focusing on the human and social aspects of multimedia content.
• … community driven.
multimediaeval.org
Predicting Media Interestingness Task
Automatically select frames or portions of movies which are the most interesting for a common viewer.
● Goal: Make use of the visual, audio and text content (features provided).
● Data: consists in ca 100 movie trailers, together with human annotations
● Metric: System performance is to be evaluated using standard Mean Average Precision.
Predicting Media Interestingness Task
http://multimediaeval.org
Retrieving Diverse Social Images Task
This task addresses the problem of image search result diversification in the context of social media:
● Goal: refine a ranked list of Flickr photos retrieved with general purpose multi-topic queries using provided visual, textual and user tagging credibility information.
● Metrics: results are evaluated with respect to their relevance to the query and the diverse representation of it.
● Data: ~40k images, social metadata, text models, CNN descriptors, user tagging credibility dataset, etc
Three data sets have been published at the MMSys dataset track.
Retrieving Diverse Social Images Task (cont.)
initial retrieval results
diversified results
Initial results
Diversified results
Context of Multimedia Experience Task
Develops multimodal techniques for automatic prediction of multimedia in a particular consumption content.
● Goal: Predict movies that are suitable to watch on airplanes.
● Data: Input to the prediction methods is movie trailers, and metadata from IMDb, Rotten Tomatoes and Metacritic.
● Metric: Output is evaluated using the Weighted F1 score, with expert labels as ground truth.
This year: Task is offered at the MediaEval workshop and at a joint-challenge workshop at http://www.icpr2016.org
Context of Multimedia Experience TaskDifferent context can lead to different preferences...
...people like to watch different movies than they would at home or in the cinema.
Multimodal Person Discovery in Broadcast TV Task
● Goal: Given raw TV broadcasts, each shot must be automatically tagged with the name(s) of people who can be both seen as well as heard in the shot.
● The list of people is not known a priori and their names must be discovered in an unsupervised way from provided text overlay or speech transcripts.
● Data: Multilingual corpus from INA (French), DW (German & English) and UPC (Catalan)
● Metric: standard information retrieval metrics based on a posteriori collaborative annotation of the corpus by the participants themselves.
Person Discovery Task
Person names must be discovered in speech track and/or sub-titles. Models cannot be trained on external data.
Slide credit: Johann Poignant, Hervé Bredin, Claude Barras, Person Discovery Task Organizers MediaEval 2015
Tackling the Person Discovery Task
Slide credit: Johann Poignant, Hervé Bredin, Claude Barras, Person Discovery Task Organizers MediaEval 2015
Wrap Up
● We want to connect users with information,in order to satisfy information needs.
● CS Love: Lots of data!● CS Hate: How do people really see multimedia, what do
they want?● Way forward: Continue to define new challenges and build
algorithms to address them.
Beyond the user-item matrix
CrowdRec project
● Exploiting multiple sources of information,● Leveraging the Crowd (crowdworkers, users, curators),● Evaluating large scale.
Context-driven Recommender systems:
“People have more in common with other people in the same
situation than they do with past versions of themselves”
Roberto Pagano, Paolo Cremonesi, Martha Larson, Balazs Hidasi, Domonkos Tikk, Alexandros Karatzoglou, and Massimo Quadrana The Contextual Turn: from Context-aware to Context-driven recommender systems. ACM RecSys 2016, to appear.
Turn from personalization• Context has been taken into account by coupling it with personalization, with context-aware recommender systems
• However being aware of the context is not enough for some domains: recommendations should be driven by the context
In traditional recsys, Immutable Preference paradigm (ImP):
• User tastes do not evolve
• Goals and needs are static
• Item catalog is static
• Trendiness, Seasonality, Capacity and life-cycle addresses by tweaks to existing models
Slide credit: Roberto Pagano
Slide credit: Roberto Pagano
MusicI usually like heavy metal music, but now I have to work and I want to listen to some
soft music
Recommended for you:
Slide credit: Roberto Pagano
Jaeyoung Choi, Eungchan Kim, Martha Larson, Gerald Friedland, and Alan Hanjalic. 2015. Evento 360: Social Event Discovery from Web-scale Multimedia Collection. ACM Multimedia 2015, pp. 193-196.
Thank youMohammad Soleymani, Guillaume Gravier, Bogdan Ionescu, Gareth Jones, Claire-Helene Demarty, Ngoc Duong, Frédéric Lefebvre, Yu-Gang Jiang, Bogdan Ionescu, Mats Sjöberg, Hanli Wang,, Toan Do, Richard Sutcliffe, Chris Fox, Richard Lewis, Tom Collins, Eduard Hovy, Deane L. Root, Igor Szoke, Xavier Anguera, Claude Barras, Hervé Bredin, Camille Guinaudeau, Jean Carrive, Yannick Estève, Javier Hernando, Juliette Kahn, Nam Le, Sylvain Meignier , Ramon Morros, Johann Poignant, Satoshi Tamura, Bart Thomee, Olivier Van Laere, Claudia Hauff , Jaeyoung Choi, Emmanuel Dellandréa, Liming Chen, Yoann Baveye, Mats Sjöberg, Christina Boididou, Symeon Papadopoulos, Stuart E. Middleton, Michael Riegler, Duc Tien, Dang Nguyen, Giulia Boato, Andreas Petlund, Michael Riegler, Concetto Spampinato, Bogdan Ionescu, Alexandru Lucian Gînscă, Maia Zaharieva, Mihai Lupu, Henning Müller, Adrian Popescu, Bogdan Boteanu, Alan Woodley, Shlomo Geva, Timothy Chappell, Richi Nayak, Gabi Constantin, Roberto Pagano, Paolo Cremonesi, Martha Larson, Balazs Hidasi, Domonkos Tikk, Alexandros Karatzoglou, Massimo Quadrana, Xinchao Li, Alan Hanjalic, Andreas Lommatzsch, Benjamin Kille, Fabian Abel, Daniel Kohlsdorf, Jonas Seiler, Róbert Pálovics, Andras Benczur...
Links
● Challenges (Benchmarks)○ MediaEval Multimedia Evaluation
(http://multimediaeval.org),○ CLEF NewsREEL News Recommendation challenge
(http://www.clef-newsreel.org),○ ACM RecSys 2016 Job Recommendation challenge
(http://2016.recsyschallenge.com).● Acknowledgements
○ Multimedia Commons (http://www.multimediacommons.org),○ EC-funded CrowdRec project (http://crowdrec.eu).