Upload
so-yeon-kim
View
262
Download
3
Embed Size (px)
Citation preview
Visual-Textual Joint Relevance Learning for Tag-Based Social
Image SearchYue Gao, Meng Wang, Zheng-Jun Zha, Jialie Shen, Xuelong Li, Xindong Wu
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013
Introduction
Tag-based social image search• Social media data (Flickr, Youtube, etc.)• Associated with user generated tags, meta information (date, location, etc.)
• Conventional tag-based social image search• Too much noise in tags • Lacks an optimal ranking strategy (e.g. Flickr – time-based ranking, interest -
ingness-based ranking)
• Existing relevance-based ranking method• explore visual content and tags separately or sequentially
Proposed schema• a hypergraph-based approach to simultaneously utilize visual information and tags
Vertex: social imageHyperedge: visual word / tag
Learn the weights(importance of different visual words and tags)
Relevance scores of images
Related works
Social image search• Separated Methods• Only the textual content or the
visual content is employed for tag analysis• The useful information is miss-
ing
Social image search• Sequential Methods• The visual content and the tags
are sequentially employed for image search• The correlation among visual
content and tags are separated
Social image searchJoint method
Hypergraph learning• Hypergraph is generalization of graph in which an edge can connect to
multiple vertices• Used for data mining and information retrieval task• Effective in capturing higher-order relationship
Hypergraph analysis
Definition
Image from Wikipedia
• Vertex set • Hyperedge set • Hyperedge is able to link more than two
vertices.• Edge weight set
Hypergraph
Hypergraph analysis• Learning with hypergraphs• Binary classification with hypergraph• Normalized Laplacian method is formulated as a regularization framework
𝑎𝑟𝑔𝑚𝑖𝑛𝑓 {𝜆𝑅𝑒𝑚𝑝 ( 𝑓 )+Ω( 𝑓 )}Regularizer
Empirical loss
Weighting parameter
To-be-learned classification function
Visual-textual joint relevance learning
Hypergraph construction• Vertex construction• Vertices : Social image set• The number of vertices in Hyper-
graph is equals to the number of images in the image dataset.
Hypergraph construction• Hyperedge construction• Feature 1. visual contents
• Bag of Visual Words• Extracts local SIFT descriptors for
each images• Trains visual vocabularies with de-
scriptors
𝑓 𝑖𝑏𝑜𝑤 (𝑘 ,1 )={10
if i-th image has k-th visual wordotherwise
Hypergraph construction• Hyperedge construction• Feature 2. Textual information
• Bag of Textual Words• Tags in each image are ranked by
TagRanking• For further processing, top tags for
each image are left• For further hyperedge construction,
the total number of tags with the highest TF-IDF are left in the data-base 𝑓 𝑖
𝑡𝑎𝑔 (𝑘 ,1 )={10if i-th image has k-th tagotherwise
Hypergraph construction• Hyperedge construction• If selected two images contain the
same visual word, they are con-nected with the hyperedge.• If selected two images contain the
same tag, they are connected with the hyperedge.
• If , and is connected.• If , and is connected.
visual content based hyperedges
tag-based hyperedges
+ hyperedges in total
Hypergraph construction
Example of textual hyperedge construction Example of visual hyperedge construction
Example of the connection between two images
Social image relevance learning • Social image search task• Binary classification problem• Measure the relevance score among all vertices in hypergraph• Transductivie inference is also formulated as a regularization framework• Object Function
• Regularizer term indicates that highly related vertices should have close label results
Weight regularizer termEmpirical loss termRegularizer termWeight vectorTo-be-learned relevance score vector
Social image relevance learning • Object Function
• guarantees that the new generated labeling results are not far away from the initial label in-formation
• s.t.
• s.t.
(: the normalized hypergraph Laplacian)
(y : n × 1 initial label vector)
Optimization• Alternating optimization strategy• to-be-learned two variable w and f
we fix one and optimize the other one each time
• Using the iterative optimization method, w and f are obtained.
Probabilistic explanation• Probabilistic perspective• Deriving the optimal f and w with the maximum posterior probability given
the samples X and the label vector y
• Equivalent to the object function s.t.
Pseudo-relevant sample selection• Pseudo-relevant samples• Associated with the query tag• Have high relevance probabilities• They are not far away from result• Used for noise reduction
Pseudo-relevant sample selection• Semantic Relevance Measuring
• All the social images that are asso-ciated with the tag are ranked in descending order• The top K results are selected as
the pseudo-relevant images
• Semantic similarity
• Flickr Distance between two tags• Based on a latent topic based vis-
ual language model
𝑠 (𝑥 𝑖 ,𝑡𝑞)= 1𝑛𝑖Σ𝑡 ∈𝑇 𝑖
𝑠𝑡𝑎𝑔(𝑡𝑞 ,𝑡) 𝑠𝑡𝑎𝑔 (𝑡1 , 𝑡2 )=exp (−𝐹𝐷 (𝑡1 ,𝑡 2))
Experiments
Experimental settings• Dataset : Flickr dataset(104,000 images, 83,999 tags) + NUS-WIDE (370K+ images)• Labeling : three relevance levels : very relevant(2), relevant(1) and irrelevant(0)• Compared algorithms
• Graph based semi supervised learning (Graph)• Sequential social image relevance learning (Sequential)• Tag ranking (TagRanking)• Tag relevance combination (Uniform Tagger)• Hypergraph based relevance learning (HG)• HG + hyperedge weight estimation (HG+WE)• HG + WE (visual contents only)• HG + WE (textual contents only)
• Performance evaluation metric• Normalised Discounted Cumulative Gain (NDCG)
The NDCG@20 Results of different methods
HL+WE
HL+WE(ta
g)
HL+WE(vi
sual) HL
TagRanking
Uniform
TaggerSe
qGraph
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
10.8814 0.8578 0.8463
0.7418
0.6281 0.5994 0.5778 0.5727
The Average NDCG@20 Results
Average NDCG@k comparison
• This approach consistently outper-forms the other methods
Depth for NDCG
• Top results obtained by different methods for the query weapon.• the final ranking list can preserve
images from all different mean-ings
• Top results obtained by different methods for the query apple.• the proposed method can return
relevant results with different meanings
The effects of hyperedge weight learning
Top 100 visual words with the highest weights after the hypergraph learning process
The effects of hyperedge weight learning
Ten tags with the highest weights after the hyper-graph learning process for the queries (a) car and (b) weapon.
Variation of weighting parametersAverage NDCG@20 performance curves with respect to the variation of λ and μ.
Variation of dictionary sizeNDCG@20 comparison of the proposed method with different sizesof the tag and visual word dictionaries, i.e., and .
Variation of max. number of tagsNDCG@20 comparison of the proposed method with different selection
The parameter is employed to filter noise tags
Computational cost comparison
Conclusion
Conclusion• Proposal : joint utilization of both visual contents and tags by hyper-
graph and relevance learning procedure for social image search.
• Consideration of weights of hyperedges• Differ from previous hypergraph learning algorithms• Minimizes the effects of uninformative features
• Future work• Diversity of search results : Next issue
Thank you !
Q&A