53
Flickr Distance ACM Multimedia 2008 Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li Microsoft Research Asia University of Science and Technology of China October 28, 2008

2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Flickr Distance

ACM Multimedia 2008

Lei Wu, Xian-Sheng Hua, Nenghai Yu, Wei-Ying Ma, Shipeng Li

Microsoft Research AsiaUniversity of Science and Technology of China

October 28, 2008

Page 2: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

2

IndexingRankingClusterin

g……Recommendation

Annotation

Multimedia

Information

Retrieval

Page 3: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Multimedia

Information

Retrieval

3

Image Similarity

/Distance

Concept Similarity

/Distance

Annotation

Indexing

Ranking

Clustering

……

Recommendation

Page 4: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

4

Image Similarity

/Distance

Concept Similarity

/Distance

Image Similarity/Distance

Page 5: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

5

Image Similarity/Distance

Numerous efforts have been made.

Concept Similarity

/Distance

Concept Similarity/Distance

Page 6: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Image Similarity/Distance

6

Concept Similarity/Distance

Olympic

Numerous efforts have been made.

Sports

Cat

Tiger

Paw

More and more used, but not well studied.

Page 7: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

7

WordNet Distance

Google Distance

Tag Concurrence Distance

Page 8: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

WordNet Distance

8

WordNet150,000 words

WordNet DistanceQuite a few methods to get it in WordNetBasic idea is to measure the length of the path between two words

Pros and ConsPros:

Cons:

Built by human experts, so close to human perception

Coverage is limited and difficult to extend

Page 9: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Google Distance

9

Normalized Google Distance (NGD)Reflects the concurrency of two words in Web documentsDefined as

Pros and ConsPros:Cons:

Easy to get and huge coverage

Only reflects concurrency in textual documents. Not really concept distance (semantic relationship)

Page 10: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

10

Concept Pairs

Google Distance

Airplane – Dog 0.2562

Football – Soccer 0.1905

Horse – Donkey 0.2147

Airplane – Airport 0.3094

Car – Wheel 0.3146

Page 11: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Tag Concurrence Distance

11

Reflects the frequency of two tags occur in the same imagesBased on the same idea of NGDMostly is sparse (> 95% are zero in the similarity matrix)

Pros and ConsPros:Cons:

Images are taken into accounta)Tags are sparse so visual

concurrency is not well reflected

b)Training data is difficult to get

similarity matrix: 500 tagssimilarity matrix: 50 tags

Image Tag Concurrence Distance (Qi, Hua,

et al. ACMMM07)

Page 12: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

12

Tag Concurrence Distance

0.8532

0.1739

0.4513

0.1833

0.9617

Concept Pairs

Google Distance

Airplane – Dog 0.2562

Football – Soccer 0.1905

Horse – Donkey 0.2147

Airplane – Airport 0.3094

Car – Wheel 0.3146

Page 13: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Different Concept Relationships

13

Synonymydifferent words but the same

meaning

table tennis

ping-pong—

Visually Similarsimilar things or things of same

type

horse donkey

Meronymypart and the whole

car wheel—

Concurrencyexist at the

same scene/place

airplane

airport

Page 14: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

14

Image tag concurrence distance implicitly uses image information, but tags are too sparse

Google distance’s coverage is very high, but it is for text domain

Con

cep

t D

ista

nce

WordNet distance is good, but coverage is too low

Mine from ontology

Mine from text documents

Mine from image tags

Page 15: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

15

Can we mine concept distance

from image content?

Page 16: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Some Facts

16

Semantic concept distance is based on human’s cognition

80% of human cognition comes from visual information

There are around 2.8 billion photos on Flickr (by Sep 08)

In average each Flickr image has around 8 tags

To mine concept distance from a large tagged

image collection based on image content

bear, fur, grass, tree polar bear, water, sea polar bear, fighting, usa

Page 17: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Overview of Flickr Distance

17

Concept A: Airplane

Concept B: Airport

Concept Model A

Concept Model B

Flickr Distance (A, B)

Page 18: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Flickr Distance

0.5151

0.0315

0.4231

0.0576

0.0708

18

Flickr Distance is able to cover the four different semantic relationshipsSynonymy, Visually Similar, Meronymy, and Concurrency

Page 19: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

What We Need

19

R1: A Good Image CollectionLargeHigh coverage, especially on daily lifeWith tags

Page 20: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

What We Need

20

R2: A Good Concept Representation or ModelBased on image contentCan cover wider concept relationshipsCan handle large-concept set

SVM, Boosting, …Discriminati

veGenerative

Global FeatureLocal

Featurew/o Spatial

Relationw/ Spatial Relation

Bag-of-Words (pLSA, LDA), …

2D HMM, MRF, …

Concept Models

Page 21: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

What We Need

21

SVM, Boosting, …Discriminati

veGenerative

Global FeatureLocal

Featurew/o Spatial

Relationw/ Spatial Relation

Bag-of-Words, …2D HMM, MRF, …

Concept Models

VLM – Visual Language Model Spatial-relation sensitive

Efficient Can handle object variations

Page 22: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Statistical Language Model

22

I am talking about statisticallanguagemodel.

Unigram Model

Bigram Model

Trigram Model

xnx wPwwwwP 21

121 xxnx wwpwwwwP

2121 xxxnx wwwPwwwwP

Page 24: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Comparison on Image CategorizationCaltech 8 categories / 5097 images

pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM0

20

40

60

80

100

59 64

88 90 90

Accuracy (%)

Performance of VLM

24pLSA (BOW) LDA (BOW) 2D MHMM SVM VLM

0.00

0.50

1.00

1.50

2.00

2.50

3.00

1.11

2.44

0.44

0.840000000000001

0.14

Training Time (sec/image)

Page 25: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Latent-Topic VLM (1)

25

Why Latent-Topic

Latent-Topic VLMVisual variations of concept are taken as latent topics

Cconceptoftopiclatentkthez

Cconceptinimagejthed

conceptAC

dzPzwwwPdwwwP

thCk

thCj

K

k

Cj

Ck

Ckyxyxxy

Cjyxyxxy

:

:

:

,,1

1,,11,,1

Page 26: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Latent-Topic VLM (2)

26

Latent-Topic VLM TrainingSolved by EM algorithm, The objective function is to maximize the joint distribution of concept and its visual word arrangement Aw

Cd yx

Cjyxyxxy

w

Cj

dwwwP

CApmaximize

,1,,1 ,

,

Estimate the posteriors of the hidden topics

Maximize the likelihood of visual arrangement

Page 27: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Performance of LT-VLM

27

Comparison on Image CategorizationCaltech 8 categories / 5097 images

pLSA (BOW)

LDA (BOW) 2D MHMM SVM VLM LT-VLM0

20

40

60

80

100

59 64

88 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)

2D MHMM SVM VLM LT-VLM0.00

1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

01

0.14 0.24

Training Time (sec/image)

Page 28: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Flickr Distance

28

Kullback – Leibler (KL) divergenceGood, but not symmetric

Jensen –Shannon (JS) divergenceBetter, as it is symmetricAnd, square root of JS divergence is a metric, so is Flickr Distance

K

i

K

j zzJSCj

CiFlickr C

jCiPPDCzPCzPCCD

1 1 2121 )|()|()|(),( 2121

l Z

Z

ZZZKL lP

lPlPPPD

Cj

Ci

Ci

CCi

2

1

121 log)(

2)(

2

1)(

2

1)(

11

2121

Ci

Ci

Cj

Ci

CCi

ZZ

ZKLZKLZZJS

PPM

MPDMPDPPD

topic distance

topic distance

concept distance

Page 29: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Procedure of Flickr Distance

29

Concept A: Airplane

Concept B: Airport

Concept Model A

Concept Model B

Flickr Distance (A, B)

Tag search in

Flickr

Jensen-Shannon

Divergence

LT-VLM

Page 30: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Experiments

30

EvaluationObjective evaluationSubjective evaluation

ApplicationsConcept clusteringImage annotationTag recommendation

Page 31: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Experiments - Configurations

31

Images6,400,000 from Flickr

Concepts130,000,000 different tags10,000,000 filtered tags1,000 randomly-selected tags

ComparisonNormalized Google Distance (NGD)Tag Concurrence Distance (TCD)Flickr Distance (FD)

Page 32: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Eva1: Subjective Evaluation

32

Ground-Truth12 persons are asked to score semantic correlation of each concept pairAverage scores are taken as ground-truth

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all distance pairs D(a,b) and D(c,d)Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

NGD TCD FD0.470.480.49

0.50.510.520.530.540.550.560.57

Correct Rate

Page 33: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Eva2: Objective Evaluation

33

Ground-TruthWordNet DistanceOnly 497 concepts (overlap of WordNet and the 1000 concepts)

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all distance pairs D(a,b) and D(c,d)Step 2: Check whether the order of D(a,b) and D(c,d) is consistent with ground-truth

NGD TCD FD0.45

0.46

0.47

0.48

0.49

0.5

0.51

0.52

0.53

0.54

Correct Rate

Page 34: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

App1: Concept Clustering

34

Concept Clustering23 concepts; 3 groups – (1) outer space, (2) animal and (3) sports

Normalized Google Distance Tag Concurrence Distance Flickr Distance

Group1 Group2 Group3 Group 1 Group2 Group3 Group1 Group2 Group3

bearshorsesmoonspace

bowlingdolphindonkeySaturnsharkssnake

softballspidersturtle

Venuswhalewolf

baseballbasketball

footballgolf

soccertennis

volleyball

moonspaceVenuswhale

baseballdonkeysoftball

wolf

basketballbears

bowlingdolphinfootball

golfhorsesSaturnsharkssoccer

spiderstennisturtle

volleyball

moonSaturnspaceVenus

bearsdolphindonkey

golfhorsessharksspiderstenniswhalewolf

baseballbasketball

footballsnakesoccerbowlingsoftball

volleyball

Page 35: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

App2: Image Annotation

35

Based on an approach using concept relationDual Cross-Media Relevance Model (DCMRM, J. Liu et al. ACMMM 2007) On 79 concepts / 79,000 images

The number of correctly annotated keywords at the first N words

1 2 3 4

NGD-DCMRM 55 212 212 301

TC-DCMRM 53 186 193 310

FD-DCMRM 57 354 423 960

100300500700900

1100

55

212 212301

53186 193

310

57

354423

960

NGD-DCMRM TC-DCMRM FD-DCMRM

Tota

l n

um

ber

of

corr

ect

keyw

ord

s

Page 36: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

App3: Tag Recommendation

36

To Improve Tagging QualityEliminating tag incompletion, noises, and ambiguity500 images / 10 recommended tags per image

NGD Tag Concurrent Distance Flickr Distance0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

0.76

0.78

0.65200000000000

1

0.66500000000000

1

0.75800000000000

1

Precision @ 10

Page 39: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Summary

39

A novel approach to discover semantic relationships from image contentbased on real-life images from the Webbased on collective intelligence from grassroots

A distance more consistent with human’s perception

A measurement more effective in many applications

Flickr Distance

Page 40: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Future Work

40

Flickr Distance as a Service.

Page 41: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Thank You

41

Page 42: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Backup

42

Page 43: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

TagNet

43

TagNet – Visual Concept Net

Can be used in many applicationsKnowledge representationConcept learningMultimedia retrieval...

)(:

)(:

)(:

,,

weightDistanceFlickrWw

edgeiprelationshsemanticEe

nodeconceptVv

WEVG

Page 44: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

TagNet

44

VisualizationThe bigger the distance, the longer the edgeUsing a tool called NetDraw provided byInternational Network for

Social Network Analysis

Page 45: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Outline Motivation

Overview

Visual Language Model

Flickr Distance Calculation

Evaluations and Applications

45

Page 46: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Semantic Relationship Is Important

46

Many efforts on using semantic relationshipsGJ Qi et al. Correlative Multi-Label Video Annotation. ACM MM 2007.R. Datta et al. Image Retrieval: Ideas, Influences and the Trends of the New Age. ACM Computing Surveys, 2008.L. Leslie et al. Annotation of Paintings with High-Level Semantic Concepts Using Transductive Inference and Ontology-based Concept Disambiguation. ACM MM 2007.J. Yu et al. Semantic Subspace Projection and Its Application in Image Retrieval. IEEE T CSVT 2008.

Applications of semantic relationshipsNatural language processingObject detectionConcept detectionMultimedia retrieval

Page 47: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Discussion

47

Why VLM divergence can estimate concept distance?

Why FD works well even tags are not complete?

Computer

TV

Office

room patternscomputer patterns other patterns

room patterns TV patterns other patterns

room patternsscreen patterns other patterns

VLM: distribution of trigrams

Flickr Distance is able to cover the four different semantic relationships

Synonymy, Visually Similar, Meronymy, and Concurrency

Page 49: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Visual Word Generation

49

Typical methodsSIFT + Clustering/PCA

Our methodPatch + Texture Direction Histogram + HashingEfficient, low-dimension, and rotation-Invariant Only need 1/20 computation of SIFT feature

1 0 0 1 0 0 1 0

Image Patch

Patch Gradient

Texture HistogramHashing Visual Word

Page 50: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Performance of VLM

50

Comparison on Image CategorizationCaltech 8 categories / 5097 images (L. Wu, et al. MIR 2007/T-MM 2008)

pLSA (BOW)

LDA (BOW) 2D MHMM SVM VLM LT-VLM0

20

40

60

80

100

59 64

88 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)

2D MHMM SVM VLM LT-VLM0.00

1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

01

0.14 0.24

Training Time (sec/image)

Page 51: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Eva1: Objective Evaluation

51

Ground-TruthWordNet DistanceOnly 497 concepts (overlap of WordNet and the 1000 concepts)

Evaluate Accuracy of “Relative Distance Pairs”Step 1: Find all concept triples (A,B,C)Step 2: Get 6 distance pairs for each triple (consider asymmetry)Step 3: Compute the correct ratio of each distance pair in terms of order (not value), compared with ground-truth distance

pair

NGD Ground-TruthC

A

B C

A

B

(AB,AC) x(AB, BC) √(AC, BC) √

Page 52: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Performance of VLM

52

Comparison on Image CategorizationCaltech 8 categories / 5097 images

pLSA (BOW)

LDA (BOW) 2D MHMM SVM VLM LT-VLM0

20

40

60

80

100

59 64

88 90 90 94

Accuracy (%)

pLSA (BOW)

LDA (BOW)

2D MHMM SVM VLM LT-VLM0.00

1.00

2.00

3.00

1.11

2.44

0.44

0.8400000000000

01

0.14 0.24

Training Time (sec/image)

Page 53: 2 IndexingRankingClustering …… Recommendation Annotation Multimedia Information Retrieval

Future Work

53

ScalabilityLarge-scale testingTagNet as a service

Other data“PicNet Distance” based on different dataset / Optimizing datasetIntegrating text/tag concurrency distance and Flickr Distance

Concept modelingHandling scale variations (multiple-resolution)New models

More applicationsTag rankingQuery suggestions