57
Machine Learning and Multimedia Information Retrieval * Integrated Knowledge Solutions [email protected] * Based on a talk at ICMLA Conference

Machine learning and multimedia information retrieval

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Machine learning and multimedia information retrieval

Machine Learning and Multimedia Information Retrieval*

Integrated Knowledge Solutions

[email protected]

* Based on a talk at ICMLA Conference

Page 2: Machine learning and multimedia information retrieval

Outline

• Introduction

• Bridging the Semantic Gap

• Events in Videos

• Use of Tagging in MIR

• Killer Apps of MIR

• Take Home Message

12/12/2010 2 ICMLA Talk

Page 3: Machine learning and multimedia information retrieval

Too Much Information

Which is more frustrating?

Being stuck in traffic on way to or from work

Not being able to find information you urgently need

12/12/2010 3 ICMLA Talk

According to a survey by Xerox

Page 4: Machine learning and multimedia information retrieval

Nalanda University was one of the first universities in the world, founded in the 5th Century BC, and reported to have been visited by the Buddha during his lifetime. At its peak, in the 7th century AD, Nalanda held some 10,000 students when it was visited by the Chinese scholar Xuanzang.

Not a New Problem

The Royal Library of Alexandria, in Egypt, seems to have been the largest and most significant great library of the ancient world. It functioned as a major center of scholarship from its construction in the third century B.C. until the Roman conquest of Egypt in 48 B.C.

12/12/2010 4 ICMLA Talk

Page 5: Machine learning and multimedia information retrieval

However, Earlier

Data Producers

Data Consumers

12/12/2010 5 ICMLA Talk

Page 6: Machine learning and multimedia information retrieval

But Now a Days

12/12/2010 6 ICMLA Talk

Page 7: Machine learning and multimedia information retrieval

Some Relevant Numbers

Photobucket has 6.2 billion photos and Flickr has over 2 billion.

Facebook has over 10 Billion photos and over 400 million active users.

12/12/2010 7 ICMLA Talk

Page 8: Machine learning and multimedia information retrieval

Phenomenon

• 24 hours of videos are uploaded to YouTube every one minute

• YouTube streams 2 billions of videos every day

12/12/2010 8 ICMLA Talk

Page 9: Machine learning and multimedia information retrieval

12/12/2010 ICMLA Talk 9

So how do we get help in finding the desired multimedia information?

MIR

Page 10: Machine learning and multimedia information retrieval

So What is MIR?

• Also known as CBIR (Content-based Image Retrieval) and CBVIR (Content-based Visual Information Retrieval)

• Deals with systems that manage and facilitate searching for multimedia documents such as images, videos, audio clips and slides etc based on content

12/12/2010 10 ICMLA Talk

Page 11: Machine learning and multimedia information retrieval

History of MIR

• Conference on Database Applications of Pictorial Applications, 1979 (Florence, Italy)

• NSF Workshop on Visual Information Management Systems, 1992 (Redwood, CA)

• QBIC (Query By Image Content), 1993 (SPIE’s Conf on Storage and Retrieval for Image and Video Databases), Also First ACM Multimedia Conference

• Shift to semantic similarity from signal similarity, 1999

• Community tagging, photo and video sharing sites, 2002

12/12/2010 11 ICMLA Talk

Page 12: Machine learning and multimedia information retrieval

A Typical MIR System

Feature Extraction

Features Media Collection

Indexing & Matching

Query Feature Extraction

Retrieved Results

Relevance Feedback

12/12/2010 12 ICMLA Talk

Page 13: Machine learning and multimedia information retrieval

Semantic Gap

Early systems produced results wherein the retrieved documents were visually similar (signal level similar) but not necessarily similar in showing the same semantic concept.

Content-Based Image Retrieval at the End of the Early Years Found in: IEEE Transactions on Pattern Analysis and Machine Intelligence , Arnold Smeulders , Marcel Worring , Simone Santini , Amarnath Gupta , Ramesh Jain , December 2000

12/12/2010 13 ICMLA Talk

http://www.searchenginejournal.com/7-similarity-based-image-search-engines/8265/

Page 14: Machine learning and multimedia information retrieval

Semantic Gap

Users also like to query using descriptive words rather than query images or other multimedia objects. This requires MIR systems to correlate low-level features with high level concepts.

Visually dissimilar images representing the same concept.

12/12/2010 14 ICMLA Talk

Page 15: Machine learning and multimedia information retrieval

How to Bridge the Semantic Gap?

Exploit context • Text surrounding images • Associated sound track and closed captions in videos • Query history

Use machine learning to: • Build image category classifiers to perform semantic filtering of the results • Build specific detectors for objects to associate concepts with images •Build object models using low level features

12/12/2010 15 ICMLA Talk

Page 16: Machine learning and multimedia information retrieval

Exploiting Context: An Example

12/12/2010 16 ICMLA Talk

Kulesh, Petrushin and Sethi, “The PERSEUS Project: Creating Personalized Multimedia News Portal,” Proceedings Second Int’l Workshop on Multimedia Data Mining, 2001

Page 17: Machine learning and multimedia information retrieval

Example of Using Surrounding Text

12/12/2010 ICMLA Talk 17

Page 18: Machine learning and multimedia information retrieval

Context via Surrounding Text

12/12/2010 ICMLA Talk 18

Page 19: Machine learning and multimedia information retrieval

Context Via Surrounding Text: One More Example

12/12/2010 ICMLA Talk 19

Page 20: Machine learning and multimedia information retrieval

Better Context with More Text

12/12/2010 ICMLA Talk 20

Page 21: Machine learning and multimedia information retrieval

Improving Context via More Words per Query

12/12/2010 21 ICMLA Talk

Page 22: Machine learning and multimedia information retrieval

Issues Unique to ML for MIR

• Simultaneous presence of multiple concepts

• How to extract/isolate concept-specific features? Segment or do not segment?

• Imbalance between positive and negative examples

• Extremely large number of concepts for a general purpose MIR

Romance, couple, beach, sundown From: s163.photobucket.com

12/12/2010 22 ICMLA Talk

Page 23: Machine learning and multimedia information retrieval

A Template Relating Concepts with Pictures Concepts Image Tokens Images

12/12/2010 23 ICMLA Talk

Page 24: Machine learning and multimedia information retrieval

Feature Extraction Issues

Whole image based features. Easy to use but not very effective

Region based features. Both regular region structure and segmented regions are popular

Salient objects based features. Connected regions corresponding to dominant visual properties of objects in an image

12/12/2010 24 ICMLA Talk

Page 25: Machine learning and multimedia information retrieval

Scale Invariant Feature Transform (SIFT) Descriptors

SIFT descriptors or its variants are currently the most popular features in use. Each image generates thousands of features (key point descriptors) with each feature typically consisting of 128 values

http://www.vlfeat.org/

12/12/2010 25 ICMLA Talk

D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” IJCV, 2004.

Page 26: Machine learning and multimedia information retrieval

Feature Discovery

Basic idea is to discover features that are best suitable for a given collection

12/12/2010 26 ICMLA Talk

Mukhopadhyay, Ma, and Sethi, “Pathfinder Networks for Content Based Image Retrieval Based on Automated Shape Feature Discovery,” ISMSE 2004

Page 27: Machine learning and multimedia information retrieval

Image Category Classifiers (ICC)

• Trained using both supervised and unsupervised learning methods (SVM, DT, AdaBoost, VQ etc)

• Early work limited to few tens of categories; however some of the current systems can work with thousands of categories/concepts

12/12/2010 27 ICMLA Talk

Page 28: Machine learning and multimedia information retrieval

VQ Based Image Category Classifier

Test Image

Best Codebook Label

Water Codebook

Sky Codebook

Fire Codebook

Mustafa & Sethi (2004)

12/12/2010 28 ICMLA Talk

Page 29: Machine learning and multimedia information retrieval

Object Detectors

12/12/2010 29 ICMLA Talk

PASCAL Visual Object Classes Challenge

Page 30: Machine learning and multimedia information retrieval

Project

http://labelme.csail.mit.edu/

Web-based annotation tool to segment and label image regions. Labeled objects in images are used as training images to build object detectors.

12/12/2010 30 ICMLA Talk

Page 31: Machine learning and multimedia information retrieval

IMARS provides a large number of built-in classifiers for visual categories that cover places, people, objects, settings, activities and events. It is easy to add new ones. IMARS can work on PC or laptop (trial version is available at IBM alphaWorks). IMARS can also work at large-scale for high-volume batch processing of millions and images and videos per day. Several demos of IMARS are available (see IMARS demos)

Image Category Classifiers Examples

12/12/2010 31 ICMLA Talk

Page 32: Machine learning and multimedia information retrieval

Semantic labeling. (a) An MPE semantic retrieval system groups images by semantic concept and learns a probabilistic model for each concept. (b) The system represents each image by a vector of posterior concept probabilities.

From Pixels to Semantic Spaces: Advances in Content-Based Image Retrieval (Nuno Vasconcelos, IEEE Computer, July 2007)

Image Classification via Probabilistic Modeling

12/12/2010 ICMLA Talk 32

Page 33: Machine learning and multimedia information retrieval

Retrieving Events in Videos

• An event in MIR implies an interesting spatiotemporal instance

• Considerable work in MIR community on events because of popularity of sports videos

• Also tremendous interest in detecting and recognizing events with potential homeland security applications

12/12/2010 33 ICMLA Talk

Page 34: Machine learning and multimedia information retrieval

Event Retrieval Examples: Supervised Approach

Mustafa & Sethi AVSS Conference 2005

12/12/2010 34 ICMLA Talk

Page 35: Machine learning and multimedia information retrieval

Unsupervised Learning for Event Retrieval

Mustafa & Sethi, ICTAI 2007

12/12/2010 35 ICMLA Talk

Page 36: Machine learning and multimedia information retrieval

Unsupervised Learning Based Event Retrieval

12/12/2010 36 ICMLA Talk

Mustafa & Sethi, ICTAI 2007

Page 37: Machine learning and multimedia information retrieval

Retrieval By Cross-Modal Associations

Approaches: Latent semantic indexing (LSI) Cross-modal factor analysis (CFA) Canonical correlation analysis (CCA)

- Using query from one modality (e.g. audio) to retrieve content on a different modality (e.g. video) - Directly on low-level features

Li, Dimitrova, Li and Sethi (ACM MM 03) 12/12/2010 37 ICMLA Talk

Page 38: Machine learning and multimedia information retrieval

Talking Face Example

...

Feature

Extraction

Feature

Extraction

Query

Collection

of Image

Sequences

Retrieval Results

Cross-Modal

Association

12/12/2010 38 ICMLA Talk

M. Li, D. Li, Dimitrova and Sethi, “Audio-Visual Talking Face Detection,” Proceedings, ICME, 2003

Page 39: Machine learning and multimedia information retrieval

Tagging in MIR

All time most popular tags at Flickr

12/12/2010 39 ICMLA Talk

Page 40: Machine learning and multimedia information retrieval

About Tags

• User centered

• Imprecise and often overly personalized

• Tag distribution follows power law

• Most users use very few distinct tags while a small group of users works with extremely large set of tags

12/12/2010 40 ICMLA Talk

Page 41: Machine learning and multimedia information retrieval

How are Tags Being Used in MIR?

Relating tags in different languages through visual features

Aurnhammer, Hanappe and Steels Proc. WWW2006

12/12/2010 41 ICMLA Talk

Page 42: Machine learning and multimedia information retrieval

Tag Suggester

Kucuktunc, Sevil, Tosun, Zitouni, Duygulu, and Can (SAMT 08)

12/12/2010 42 ICMLA Talk

Page 43: Machine learning and multimedia information retrieval

Collaborative Tags

• Also known as Folksonomy, social tagging, and social classification

• Great for content characterization • The tag size represents the number of times the tag has

been applied to the same item by different users. It kind of represents the level of agreement /confidence in a tag.

12/12/2010 43 ICMLA Talk

Page 44: Machine learning and multimedia information retrieval

Decision Tree Based Tagger

• Uses social tags in binary/weighted mode

• Generates/suggests multiple tags through a single decision tree classifier

First, the label vectors associated with training vectors are clustered into two initial groups

Next, the SVM is used on training vectors to yield the split that best matches the clustering result

An impurity based measure is used to iteratively adjust the split, if needed

12/12/2010 44 ICMLA Talk

Ma, Sethi, and Patel. “Multilabel Classification Method for Multimedia Tagging”. (IJMDEM, 2010)

Page 45: Machine learning and multimedia information retrieval

12/12/2010 45 ICMLA Talk

Page 46: Machine learning and multimedia information retrieval

12/12/2010 46 ICMLA Talk

Page 47: Machine learning and multimedia information retrieval

Current Status of MIR

• Extensive interest as evident from conferences, journals, and special issues

• Most in the MM community happy with the progress

• Gap between published results and results from publicly available systems on web. (http://www.theopavlidis.com/technology/CBIR/PaperB/icpr08.htm)

• Lack of application focus

• Plenty of scope for machine learning to help improve MIR systems performance

• Killer applications are beginning to emerge

12/12/2010 ICMLA Talk 47

Page 48: Machine learning and multimedia information retrieval

MIR Application Examples

12/12/2010 ICMLA Talk 48

Tattoo-ID: Automatic Tattoo Image Retrieval for Suspect & Victim Identification (Anil K. Jain, Jung-Eun Lee, and Rong Jin)

Page 49: Machine learning and multimedia information retrieval

Biological and Medical Data Retrieval

12/12/2010 ICMLA Talk 49

http://www.cs.washington.edu/research/VACE/Multimedia/

Page 50: Machine learning and multimedia information retrieval

Killer Apps?

12/12/2010 ICMLA Talk 50

Page 51: Machine learning and multimedia information retrieval

http://www.iqengines.com/applications.php

12/12/2010 51 ICMLA Talk

Page 52: Machine learning and multimedia information retrieval

12/12/2010 52 ICMLA Talk

http://www.iqengines.com/applications.php

Page 53: Machine learning and multimedia information retrieval

http://www.thingd.com

Bloomberg Businessweek, Nov29, 2010 12/12/2010 53 ICMLA Talk

Page 54: Machine learning and multimedia information retrieval

12/12/2010 54 ICMLA Talk

Page 55: Machine learning and multimedia information retrieval

Take Home Message

• MIR is emerging in the commercial domain. Lot more activity is expected in near future

• MIR community is obsessed with general purpose retrieval engine; a folly pursued by computer vision community for a long time

• ML is playing a vital role in MIR

• Approaches combining social search and visual search techniques are expected to gain prominence

12/12/2010 ICMLA Talk 55

Page 56: Machine learning and multimedia information retrieval

Acknowledgement

• This presentation is based on the work of numerous researchers from the MIR/ML/CVPR community. I have tried to give credit/references wherever possible. Any omission is unintentional and I apologize for that.

• Also want to thank my present and past students and collaborators.

12/12/2010 ICMLA Talk 56

Page 57: Machine learning and multimedia information retrieval

Questions?

12/12/2010 57 ICMLA Talk