Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search

Visual-Textual Joint Relevance Learning for Tag-Based Social

Image SearchYue Gao, Meng Wang, Zheng-Jun Zha, Jialie Shen, Xuelong Li, Xindong Wu

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013

Introduction

Tag-based social image search• Social media data (Flickr, Youtube, etc.)• Associated with user generated tags, meta information (date, location, etc.)

• Conventional tag-based social image search• Too much noise in tags • Lacks an optimal ranking strategy (e.g. Flickr – time-based ranking, interest -

ingness-based ranking)

• Existing relevance-based ranking method• explore visual content and tags separately or sequentially

Proposed schema• a hypergraph-based approach to simultaneously utilize visual information and tags

Vertex: social imageHyperedge: visual word / tag

Learn the weights(importance of different visual words and tags)

Relevance scores of images

Related works

Social image search• Separated Methods• Only the textual content or the

visual content is employed for tag analysis• The useful information is miss-

ing

Social image search• Sequential Methods• The visual content and the tags

are sequentially employed for image search• The correlation among visual

content and tags are separated

Social image searchJoint method

Hypergraph learning• Hypergraph is generalization of graph in which an edge can connect to

multiple vertices• Used for data mining and information retrieval task• Effective in capturing higher-order relationship

Hypergraph analysis

Definition

Image from Wikipedia

• Vertex set • Hyperedge set • Hyperedge is able to link more than two

vertices.• Edge weight set

Hypergraph

Hypergraph analysis• Learning with hypergraphs• Binary classification with hypergraph• Normalized Laplacian method is formulated as a regularization framework

𝑎𝑟𝑔𝑚𝑖𝑛𝑓 {𝜆𝑅𝑒𝑚𝑝 ( 𝑓 )+Ω( 𝑓 )}Regularizer

Empirical loss

Weighting parameter

To-be-learned classification function

Visual-textual joint relevance learning

Hypergraph construction• Vertex construction• Vertices : Social image set• The number of vertices in Hyper-

graph is equals to the number of images in the image dataset.

Hypergraph construction• Hyperedge construction• Feature 1. visual contents

• Bag of Visual Words• Extracts local SIFT descriptors for

each images• Trains visual vocabularies with de-

scriptors

𝑓 𝑖𝑏𝑜𝑤 (𝑘 ,1 )={10

if i-th image has k-th visual wordotherwise

Hypergraph construction• Hyperedge construction• Feature 2. Textual information

• Bag of Textual Words• Tags in each image are ranked by

TagRanking• For further processing, top tags for

each image are left• For further hyperedge construction,

the total number of tags with the highest TF-IDF are left in the data-base 𝑓 𝑖

𝑡𝑎𝑔 (𝑘 ,1 )={10if i-th image has k-th tagotherwise

Hypergraph construction• Hyperedge construction• If selected two images contain the

same visual word, they are con-nected with the hyperedge.• If selected two images contain the

same tag, they are connected with the hyperedge.

• If , and is connected.• If , and is connected.

visual content based hyperedges

tag-based hyperedges

+ hyperedges in total

Hypergraph construction

Example of textual hyperedge construction Example of visual hyperedge construction

Example of the connection between two images

Social image relevance learning • Social image search task• Binary classification problem• Measure the relevance score among all vertices in hypergraph• Transductivie inference is also formulated as a regularization framework• Object Function

• Regularizer term indicates that highly related vertices should have close label results

Weight regularizer termEmpirical loss termRegularizer termWeight vectorTo-be-learned relevance score vector

Social image relevance learning • Object Function

• guarantees that the new generated labeling results are not far away from the initial label in-formation

• s.t.

• s.t.

(: the normalized hypergraph Laplacian)

(y : n × 1 initial label vector)

Optimization• Alternating optimization strategy• to-be-learned two variable w and f

we fix one and optimize the other one each time

• Using the iterative optimization method, w and f are obtained.

Probabilistic explanation• Probabilistic perspective• Deriving the optimal f and w with the maximum posterior probability given

the samples X and the label vector y

• Equivalent to the object function s.t.

Pseudo-relevant sample selection• Pseudo-relevant samples• Associated with the query tag• Have high relevance probabilities• They are not far away from result• Used for noise reduction

Pseudo-relevant sample selection• Semantic Relevance Measuring

• All the social images that are asso-ciated with the tag are ranked in descending order• The top K results are selected as

the pseudo-relevant images

• Semantic similarity

• Flickr Distance between two tags• Based on a latent topic based vis-

ual language model

𝑠 (𝑥 𝑖 ,𝑡𝑞)= 1𝑛𝑖Σ𝑡 ∈𝑇 𝑖

𝑠𝑡𝑎𝑔(𝑡𝑞 ,𝑡) 𝑠𝑡𝑎𝑔 (𝑡1 , 𝑡2 )=exp (−𝐹𝐷 (𝑡1 ,𝑡 2))

Experiments

Experimental settings• Dataset : Flickr dataset(104,000 images, 83,999 tags) + NUS-WIDE (370K+ images)• Labeling : three relevance levels : very relevant(2), relevant(1) and irrelevant(0)• Compared algorithms

• Graph based semi supervised learning (Graph)• Sequential social image relevance learning (Sequential)• Tag ranking (TagRanking)• Tag relevance combination (Uniform Tagger)• Hypergraph based relevance learning (HG)• HG + hyperedge weight estimation (HG+WE)• HG + WE (visual contents only)• HG + WE (textual contents only)

• Performance evaluation metric• Normalised Discounted Cumulative Gain (NDCG)

The NDCG@20 Results of different methods

HL+WE

HL+WE(ta

g)

HL+WE(vi

sual) HL

TagRanking

Uniform

TaggerSe

qGraph

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

10.8814 0.8578 0.8463

0.7418

0.6281 0.5994 0.5778 0.5727

The Average NDCG@20 Results

Average NDCG@k comparison

• This approach consistently outper-forms the other methods

Depth for NDCG

• Top results obtained by different methods for the query weapon.• the final ranking list can preserve

images from all different mean-ings

• Top results obtained by different methods for the query apple.• the proposed method can return

relevant results with different meanings

The effects of hyperedge weight learning

Top 100 visual words with the highest weights after the hypergraph learning process

The effects of hyperedge weight learning

Ten tags with the highest weights after the hyper-graph learning process for the queries (a) car and (b) weapon.

Variation of weighting parametersAverage NDCG@20 performance curves with respect to the variation of λ and μ.

Variation of dictionary sizeNDCG@20 comparison of the proposed method with different sizesof the tag and visual word dictionaries, i.e., and .

Variation of max. number of tagsNDCG@20 comparison of the proposed method with different selection

The parameter is employed to filter noise tags

Computational cost comparison

Conclusion

Conclusion• Proposal : joint utilization of both visual contents and tags by hyper-

graph and relevance learning procedure for social image search.

• Consideration of weights of hyperedges• Differ from previous hypergraph learning algorithms• Minimizes the effects of uninformative features

• Future work• Diversity of search results : Next issue

Thank you !

Q&A

Data & Analytics

Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search