49
Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation Martha Larson Delft University of Technology and Radboud University Nijmegen 29 June 2016, Communication Science, Radboud University Nijmegen

Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Embed Size (px)

Citation preview

Page 1: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Multimedia Information Retrieval:

Bytes and pixels meet the challenges of human media interpretation

Martha LarsonDelft University of Technology and Radboud University Nijmegen29 June 2016, Communication Science, Radboud University Nijmegen

Page 2: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

About me

● Where do I work?○ TU Delft: Multimedia Computing Group○ Radboud University: Multimedia Information Technology

● What do I do?○ Background: Speech and language,○ Research: Multimedia retrieval and recommender systems,○ Emphasis: How people interpret and use multimedia.

● What am I doing today?○ Sharing with you potential and open issues.

Page 3: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Today’s topics

● Introducing intelligent information systems○ Multimedia information retrieval (user is active)○ Recommender systems (user is passive)

● Computer Science and Multimedia○ The “love” relationship: lots of data○ The “hate” relationship: people’s interpretation of media

is not “neat”!● How to move forward?

○ Benchmarking challenges

Page 4: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Intelligent Information Systems

● Connect users with information,● Information: digital content, facts, products, services,● Include search engines and recommender systems,● Success is judged by satisfaction of user needs.

Page 5: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Page 6: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Information retrieval

Definition: Information retrieval (IR) is finding material of an unstructured nature that satisfies an information need from within large collections. http://nlp.stanford.edu/IR-book/html/htmledition/irbook.html

Page 7: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Page 8: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Recommender Systems

Definition: A recommender system tries to identify sets of items that are likely to be of interest to a certain user given some information from that user’s profile.

Page 9: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

“Multimedia Clues” for the computer scientist● Text: Things people write about images and videos.● User interactions: What people click on, how long they

watch.● Pixel statistics: Colors, lines, textures, shot change

patterns.● Concept detection: Entities that can be detected in

images and videos (faces can be detected well).● Speech recognition: What is said in a video.● Sound detection: Sounds that can be detected (laughter

and gunshots can be detected well).

Page 10: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Page 11: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Page 12: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Page 13: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Visual Geo-location prediction

● Combine evidence from multiple images (e) taken in an area (Eg).

● Upweight elements that are distinctive for that particular area (WGeo).

Xinchao Li, Alan Hanjalic, Martha Larson. Geo-distinctive Visual Element Matching for Location Estimation of Images, Under review. http://arxiv.org/pdf/1601.07884v1.pdf

Page 14: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Page 15: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Good match: Lots of what’s unique

Page 16: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Visual Geo-location prediction

Xinchao Li, Alan Hanjalic, Martha Larson. Geo-distinctive Visual Element Matching for Location Estimation of Images, Under review. http://arxiv.org/pdf/1601.07884v1.pdf

Page 17: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation
Page 18: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Conventional search engine finds “what”

Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248.

I want a song called “koi pond”.I’m interested in garden koi ponds.

Page 19: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Intent-aware search responds to “why”

Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248.

I am interested in the significance of koi ponds.

I want to build a koi pond.

Page 20: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

User intent in video search

Our study identified five major reasons why people search for videos online:

● Information (declarative knowledge)● Experience for Learning (performative knowledge)● Experience for Exposure (“being there”)● Affect (change of mood)● Object (video as video)

Alan Hanjalic, Christoph Kofler, and Martha Larson. 2012. Intent and its discontents: the user at the wheel of the online video search engine. In Proceedings of the 20th ACM international conference on Multimedia (MM '12). ACM, New York, NY, USA, 1239-1248.

Page 21: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Why are video moments important?

R. Vliegendhart, M. Larson, B. Loni and A. Hanjalic, "Exploiting the Deep-Link Commentsphere to Support Non-Linear Video Access," in IEEE Transactions on Multimedia, vol. 17, no. 8, pp. 1372-1384, Aug. 2015.

Page 22: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Viewer Expressive Reactions

R. Vliegendhart, M. Larson, B. Loni and A. Hanjalic, "Exploiting the Deep-Link Commentsphere to Support Non-Linear Video Access," in IEEE Transactions on Multimedia, vol. 17, no. 8, pp. 1372-1384, Aug. 2015.

Expressive reactions are not emotional in the classic sense.

They are also not completely personal...but..

Page 23: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

The way people take a picture reflects what they are taking a picture of.

Pixel statistics reveal very simple information on how people take pictures.

We need people to judge if the computer guesses right.

Michael Riegler, Martha Larson, Mathias Lux, and Christoph Kofler. 2014. How 'How' Reflects What's What: Content-based Exploitation of How Users Frame Social Images. In Proceedings of the 22nd ACM international conference on Multimedia (MM '14).

Fashion and framing

Page 24: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Characterize the trend...

Page 25: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Jacket types are already very difficult for computers!

Page 26: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Crowdsourcing

People interpret images in exchange for micropayments.

Example: Amazon Mechanical Turk

Page 27: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

MediaEval 2016Multimedia Benchmark Initiative

moving forward with benchmarking

Page 28: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

MediaEval Multimedia Evaluation Benchmark

● offers tasks on multimedia access and retrieval,● exploits features derived from multiple modalities:

speech, audio, visual content, tags, users, context, ● solutions may or may not involve machine learning.

multimediaeval.org

This year: MediaEval workshop is right after ACM Multimedia 2016

in Amsterdam

Page 29: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Example MediaEval Tasks● Predicting Media Interestingness: Infer interesting

frames and segments of movies (using audio, visual features, text).

● Retrieving Diverse Social Images: Diversify image results lists (text, visual features).

● Context of Multimedia Experience: Predict multimedia content suitable for watching in stressful situations.

● Person Discovery: finding people in broadcast content.● Placing: geo-location estimation for social multimedia.

multimediaeval.org

Page 30: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Publications arising from MediaEvalhttp://www.citeulike.org/group/16499

Page 31: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

2015 Workshop Participants80 participants from 25 countries

multimediaeval.org

Page 32: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

MediaEval Proceedings Papers

multimediaeval.org

Page 33: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

What sets MediaEval apart?

• … emphasizes the "multi" in multimedia: speech, audio, visual content, tags, users, context.

• … innovates new tasks and techniques focusing on the human and social aspects of multimedia content.

• … community driven.

multimediaeval.org

Page 34: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Predicting Media Interestingness Task

Automatically select frames or portions of movies which are the most interesting for a common viewer.

● Goal: Make use of the visual, audio and text content (features provided).

● Data: consists in ca 100 movie trailers, together with human annotations

● Metric: System performance is to be evaluated using standard Mean Average Precision.

Page 35: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Predicting Media Interestingness Task

http://multimediaeval.org

Page 36: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Retrieving Diverse Social Images Task

This task addresses the problem of image search result diversification in the context of social media:

● Goal: refine a ranked list of Flickr photos retrieved with general purpose multi-topic queries using provided visual, textual and user tagging credibility information.

● Metrics: results are evaluated with respect to their relevance to the query and the diverse representation of it.

● Data: ~40k images, social metadata, text models, CNN descriptors, user tagging credibility dataset, etc

Three data sets have been published at the MMSys dataset track.

Page 37: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Retrieving Diverse Social Images Task (cont.)

initial retrieval results

diversified results

Initial results

Diversified results

Page 38: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Context of Multimedia Experience Task

Develops multimodal techniques for automatic prediction of multimedia in a particular consumption content.

● Goal: Predict movies that are suitable to watch on airplanes.

● Data: Input to the prediction methods is movie trailers, and metadata from IMDb, Rotten Tomatoes and Metacritic.

● Metric: Output is evaluated using the Weighted F1 score, with expert labels as ground truth.

This year: Task is offered at the MediaEval workshop and at a joint-challenge workshop at http://www.icpr2016.org

Page 39: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Context of Multimedia Experience TaskDifferent context can lead to different preferences...

...people like to watch different movies than they would at home or in the cinema.

Page 40: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Multimodal Person Discovery in Broadcast TV Task

● Goal: Given raw TV broadcasts, each shot must be automatically tagged with the name(s) of people who can be both seen as well as heard in the shot.

● The list of people is not known a priori and their names must be discovered in an unsupervised way from provided text overlay or speech transcripts.

● Data: Multilingual corpus from INA (French), DW (German & English) and UPC (Catalan)

● Metric: standard information retrieval metrics based on a posteriori collaborative annotation of the corpus by the participants themselves.

Page 41: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Person Discovery Task

Person names must be discovered in speech track and/or sub-titles. Models cannot be trained on external data.

Slide credit: Johann Poignant, Hervé Bredin, Claude Barras, Person Discovery Task Organizers MediaEval 2015

Page 42: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Tackling the Person Discovery Task

Slide credit: Johann Poignant, Hervé Bredin, Claude Barras, Person Discovery Task Organizers MediaEval 2015

Page 43: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Wrap Up

● We want to connect users with information,in order to satisfy information needs.

● CS Love: Lots of data!● CS Hate: How do people really see multimedia, what do

they want?● Way forward: Continue to define new challenges and build

algorithms to address them.

Page 44: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Beyond the user-item matrix

CrowdRec project

● Exploiting multiple sources of information,● Leveraging the Crowd (crowdworkers, users, curators),● Evaluating large scale.

Context-driven Recommender systems:

“People have more in common with other people in the same

situation than they do with past versions of themselves”

Roberto Pagano, Paolo Cremonesi, Martha Larson, Balazs Hidasi, Domonkos Tikk, Alexandros Karatzoglou, and Massimo Quadrana The Contextual Turn: from Context-aware to Context-driven recommender systems. ACM RecSys 2016, to appear.

Page 45: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Turn from personalization• Context has been taken into account by coupling it with personalization, with context-aware recommender systems

• However being aware of the context is not enough for some domains: recommendations should be driven by the context

In traditional recsys, Immutable Preference paradigm (ImP):

• User tastes do not evolve

• Goals and needs are static

• Item catalog is static

• Trendiness, Seasonality, Capacity and life-cycle addresses by tweaks to existing models

Slide credit: Roberto Pagano

Slide credit: Roberto Pagano

Page 46: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

MusicI usually like heavy metal music, but now I have to work and I want to listen to some

soft music

Recommended for you:

Slide credit: Roberto Pagano

Page 47: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Jaeyoung Choi, Eungchan Kim, Martha Larson, Gerald Friedland, and Alan Hanjalic. 2015. Evento 360: Social Event Discovery from Web-scale Multimedia Collection. ACM Multimedia 2015, pp. 193-196.

Page 48: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Thank youMohammad Soleymani, Guillaume Gravier, Bogdan Ionescu, Gareth Jones, Claire-Helene Demarty, Ngoc Duong, Frédéric Lefebvre, Yu-Gang Jiang, Bogdan Ionescu, Mats Sjöberg, Hanli Wang,, Toan Do, Richard Sutcliffe, Chris Fox, Richard Lewis, Tom Collins, Eduard Hovy, Deane L. Root, Igor Szoke, Xavier Anguera, Claude Barras, Hervé Bredin, Camille Guinaudeau, Jean Carrive, Yannick Estève, Javier Hernando, Juliette Kahn, Nam Le, Sylvain Meignier , Ramon Morros, Johann Poignant, Satoshi Tamura, Bart Thomee, Olivier Van Laere, Claudia Hauff , Jaeyoung Choi, Emmanuel Dellandréa, Liming Chen, Yoann Baveye, Mats Sjöberg, Christina Boididou, Symeon Papadopoulos, Stuart E. Middleton, Michael Riegler, Duc Tien, Dang Nguyen, Giulia Boato, Andreas Petlund, Michael Riegler, Concetto Spampinato, Bogdan Ionescu, Alexandru Lucian Gînscă, Maia Zaharieva, Mihai Lupu, Henning Müller, Adrian Popescu, Bogdan Boteanu, Alan Woodley, Shlomo Geva, Timothy Chappell, Richi Nayak, Gabi Constantin, Roberto Pagano, Paolo Cremonesi, Martha Larson, Balazs Hidasi, Domonkos Tikk, Alexandros Karatzoglou, Massimo Quadrana, Xinchao Li, Alan Hanjalic, Andreas Lommatzsch, Benjamin Kille, Fabian Abel, Daniel Kohlsdorf, Jonas Seiler, Róbert Pálovics, Andras Benczur...

Page 49: Multimedia Information Retrieval: Bytes and pixels meet the challenges of human media interpretation

Links

● Challenges (Benchmarks)○ MediaEval Multimedia Evaluation

(http://multimediaeval.org),○ CLEF NewsREEL News Recommendation challenge

(http://www.clef-newsreel.org),○ ACM RecSys 2016 Job Recommendation challenge

(http://2016.recsyschallenge.com).● Acknowledgements

○ Multimedia Commons (http://www.multimediacommons.org),○ EC-funded CrowdRec project (http://crowdrec.eu).