1
Image Tag Completion by Noisy Matrix Recovery Zheyun Feng * , Songhe Feng , Rong Jin * , Anil K. Jain * * Michigan State University, Beijing Jiaotong University Background and Motivation Simultaneously enrich missing tags and remove noisy tags for images. Motivation: Large amount of images with incomplete and inaccurate tags; Popularity of tag based tasks, eg., tag-based image retrieval. Limitations of existing image tagging algorithms: Dealing with only one of the two problems; Large amount of training images with complete and accurate tags; No principled approach of capturing the correlation among tags. Proposed TCMR Algorithm Convex optimization computationally efficient; Low rank enforcement key assumption in topic model; Graph Laplacian exploration consistent between tags and visual cues; Provide theoretical guarantee for image tag completion for the first time. Two Assumptions Idea of Language Model: Observed tags of each image are drawn inde- pendently from a multinomial distribution. Number of observed tags (m * ) is limited; Number of parameters to be estimated is significantly larger than m * . Low Rank Matrix Recovery: Tags of any image are sampled from a mix- ture of a small number of multinomial distribution. Recovered tag matrix has to be of low rank. Notation m, m * : the number of unique tags or assigned tags for each image; ●D={d 1 , , d n }: tagged image set, where d i is the i-th tag vector; P =(p 1 , , p n ): the multinomial distributions for all images; p i : the multinomial distribution to generate tags in d i ; Q tr , Q 1 : the nuclear (trace) norm and 1 norm of matrix. Tag Completion by Noisy Matrix Recovery (TCMR) Recover the multinomial probability P by combining the maximum likeli- hood estimation and low rank matrix recovery theory min QΔ L(Q) ∶= - n i=1 m j =1 d i,j m * log Q i,j + εQ tr . where Δ =Q ∈(0, 1) m×n Q *,i 1 = 1,i ∈[1,n]. Left term: ensures consistency between optimal ˆ Q and observed tags; Right term: enforces tag matrix to be low rank. Entries sampled from unknown multinomial distributions likelihood; Entries sampled uniformly at random from a given matrix square loss. Theoretical Guarantee of RKML Theorem. Let r be the rank of matrix P , N be the total number of observed tags, and ˆ Q be the optimal P . Assume N Ω(n log (n + m)), and denote by μ - and μ + the lower and upper bounds for the probabilities in P . Then we have, with a high probability 1 n ˆ Q - P 1 O rnθ 2 log (n+m) N , where θ 2 ∶= μ + P 1 2 - μ 2 + μ 2 - . Recovery error: O (rn log (n + m)N ); Tag matrix can be accurately recovered when N Ω(rn log (n + m)). Incorporation with Visual Features and Irrelevant Tags Optimization problem becomes: min QΔ - n,m i,j =1 { d i,j m * log Q i,j + 1-d i,j m-m * log (1 - Q i,j ) }+ α n Tr (Q T LQ)+β Q tr . X =(x 1 , , x n ) : visual features of n images, where x i R d ; W =[w i,j ] n×n : w i,j = exp (-d(x i , x j ) 2 σ 2 ) is the pairwise similarity; L = diag(W 1)- W : the graph Laplacian; Tr (Q LQ)= n i,j =1 W i,j Q *,i - Q *,j 2 : tag-visual content correlation. Efficient Solution Re-write the objective function as L(Q)= f (Q)+ εQ tr , and Q k = arg min Q P t k (Q, Q k -1 )= 1 2 Q -(Q k -1 - 1 t k f (Q k -1 )) 2 F + ε t k Q tr . where t k is the step size for the k th iteration. Datasets Comparison with State-of-the-art Baselines Comparison without Considering Visual Features With Varied Number of Observed Tags Tag Completion With Noisy Tags

Image Tag Completion by Noisy Matrix Recoveryfengzheyun.github.io/downloads/papers/eccv14_poster.pdfImage Tag Completion by Noisy Matrix Recovery Zheyun Feng∗, Songhe Feng‡, Rong

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Image Tag Completion by Noisy Matrix Recoveryfengzheyun.github.io/downloads/papers/eccv14_poster.pdfImage Tag Completion by Noisy Matrix Recovery Zheyun Feng∗, Songhe Feng‡, Rong

Image Tag Completion by Noisy Matrix RecoveryZheyun Feng∗, Songhe Feng‡, Rong Jin∗, Anil K. Jain∗

∗Michigan State University, ‡Beijing Jiaotong University

Background and MotivationSimultaneously enrich missing tags and remove noisy tags for images.

Motivation:F Large amount of images with incomplete and inaccurate tags;F Popularity of tag based tasks, eg., tag-based image retrieval.

Limitations of existing image tagging algorithms:G Dealing with only one of the two problems;G Large amount of training images with complete and accurate tags;G No principled approach of capturing the correlation among tags.

Proposed TCMR Algorithm

+ Convex optimization à computationally efficient;+ Low rank enforcement à key assumption in topic model;+ Graph Laplacian exploration à consistent between tags and visual cues;+ Provide theoretical guarantee for image tag completion for the first time.

Two Assumptions, Idea of Language Model: Observed tags of each image are drawn inde-pendently from a multinomial distribution.* Number of observed tags (m∗) is limited;* Number of parameters to be estimated is significantly larger than m∗.

, Low Rank Matrix Recovery: Tags of any image are sampled from a mix-ture of a small number of multinomial distribution.à Recovered tag matrix has to be of low rank.

Notationm, m∗: the number of unique tags or assigned tags for each image; D = d1,⋯,dn: tagged image set, where di is the i-th tag vector; P = (p1,⋯,pn): the multinomial distributions for all images; pi: the multinomial distribution to generate tags in di; ∣Q∣tr, ∣Q∣1: the nuclear (trace) norm and `1 norm of matrix.

Tag Completion by Noisy Matrix Recovery (TCMR)Recover the multinomial probability P by combining the maximum likeli-hood estimation and low rank matrix recovery theory

minQ∈∆ L(Q) ∶= −∑ni=1∑mj=1di,jm∗

logQi,j + ε∣Q∣tr.where ∆ = Q ∈ (0, 1)m×n ∶ Q⊺

∗,i1 = 1, i ∈ [1, n].+ Left term: ensures consistency between optimal Q and observed tags;+ Right term: enforces tag matrix to be low rank.4 Entries sampled from unknown multinomial distributions à likelihood;7 Entries sampled uniformly at random from a given matrix à square loss.

Theoretical Guarantee of RKMLTheorem. Let r be the rank of matrix P ,N be the total number of observedtags, and Q be the optimal P . Assume N ≥ Ω(n log(n +m)), and denoteby µ− and µ+ the lower and upper bounds for the probabilities in P . Thenwe have, with a high probability

1n∣Q − P ∣1 ≤ O (rnθ2 log(n+m)

N ) , where θ2 ∶= µ+∣P 1∣∞nµ2

−≤ µ2

+µ2−.

+ Recovery error: O(rn log(n +m)/N);+ Tag matrix can be accurately recovered when N ≥ Ω(rn log(n +m)).

Incorporation with Visual Features and Irrelevant TagsOptimization problem becomes:

minQ∈∆ −∑n,mi,j=1di,jm∗

logQi,j+ 1−di,jm−m∗

log(1 −Qi,j) + αnTr(QTLQ) +β∣Q∣tr.

* X = (x1,⋯,xn)⊺: visual features of n images, where xi ∈ Rd;* W = [wi,j]n×n: wi,j = exp (−d(xi,xj)2/σ2) is the pairwise similarity;* L = diag(W ⊺1) −W : the graph Laplacian;* Tr(Q⊺LQ) = ∑ni,j=1Wi,j∣Q∗,i −Q∗,j∣2: tag-visual content correlation.

Efficient SolutionRe-write the objective function as L(Q) = f(Q) + ε∣Q∣tr, and

Qk = arg minQ Ptk(Q,Qk−1) = 12∣Q − (Qk−1 − 1

tk∇f(Qk−1))∣2F + ε

tk∣Q∣tr.

where tk is the step size for the kth iteration.

Datasets

Comparison with State-of-the-art Baselines

Comparison without Considering Visual Features

With Varied Number of Observed Tags

Tag Completion With Noisy Tags