Upload
trantuong
View
217
Download
0
Embed Size (px)
Citation preview
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Robust Multi-Class Transductive Learning withGraphs
Wei Liu and Shih-Fu Chang
Columbia University
June 19, 2009
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Introduction
Graph Construction
Graph Learning
Robust Multi-Class Graph Transduction (RMGT)
Experiments
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
What is Semi-Supervised Learning (SSL)?
F In the narrow sense, SSL refers particularly to semi-supervisedclassification using labeled data and unlabeled data, which oftenincludes transductive and inductive cases.
+-
seen data
transductive learning
inductive learning
unseen data
Figure: Narrow-sense semi-supervised learning.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
What is Semi-Supervised Learning (SSL)?
F In the wide sense, SSL covers all learning tasks where priorknowledge about a few data is known and knowledge about theremaining data can be inferred. The knowledge may be labels,response values, vector representations, and pairwise relations.
regression clustering
Figure: Wide-sense semi-supervised learning.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Survey and Book
Xiaojin Zhu. Semi-Supervised Learning Literature Survey,Computer Sciences Technical Report 1530, University ofWisconsin-Madison, 2005.Olivier Chapelle, Bernhard Scholkopf, and Alexander Zien.Semi-Supervised Learning, MIT Press, 2006.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Binary-Class SSL Setting
I A data set X = {x1, · · · , xl , · · · , xn} ⊂ Rd in which the first lsamples are labeled and the remaining u = n − l ones areunlabeled. Prior labels saved in y ∈ Rn such that yi ∈ {1,−1}if xi is labeled and yi = 0 if unlabeled. Use the graphLaplacian matrix L or its normalized variant L to infer theoverall labeling f ∈ Rn.
I Graph Laplacian: L = D −W where W is the weight matrixof the graph G (V ,E ,W ) built on the dataset X , andDii =
∑j Wij .
I Normalized Graph Laplacian: L = D− 12 LD− 1
2 .
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
State-of-The-Arts
F Label Propagation – the key is the Laplacian-shaped regularizer.Gaussian Fields and Harmonic Functions (GFHF), Zhu et al. 2003:
minf
fTLf
s.t. fl = yl
Local and Global Consistency (LGC), Zhou et al. 2004:
minf‖f − y‖2 + µfT Lf
Quadratic Criterion (QC), Bengio et al. 2006:
minf‖fl − yl‖2 + µfTLf + µε‖f‖2
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
F Remarks1. All these methods are akin to each other. I found that X. Zhu’smethod GFHF gives more robust performance because of the hardconstraint and no trade-off parameters.2. All these methods heavily depend on graph structures.3. All these methods naturally generalize to multi-class problems.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Motivation
1. ”Several graph-based methods listed here are similar to eachother. They differ in the particular choice of the loss function andthe regularizer. We believe it is more important to construct agood graph than to choose among the methods. However graphconstruction, as we will see later, is not a well studied area.”
X. Zhu, the SSL survey 2005.2. Two mostly used kinds of graphs: k-NN graph andh-neighborhood graph. Empirically, k-NN weighted graph withsmall k tends to perform better.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
A Simple Toy Problem–Noisy Two Moons
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2Noisy two moons
unlabelednoiselabeled: +1labeled: −1
Figure: Noisy two moons given two labeled points. We only have groundtruth labels for the points on two moons, so we evaluate classificationperformance on these on-manifold points.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
A Simple Toy Problem–Noisy Two Moons
−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2(a) LGC (13.55%)
labeled to ’+1’ labeled to ’−1’ ’+1’ ’−1’−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2(b) GFHF (14.21%)
labeled to ’+1’ labeled to ’−1’ ’+1’ ’−1’−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2(c) GFHF with sGraph (0%)
labeled to ’+1’ labeled to ’−1’ ’+1’ ’−1’
Figure: Error rates over unlabeled points. (a) LGC with 13.55% errorrate using a 10-NN graph; (b) GFHF with 14.21% error rate using a10-NN graph; (c) GFHF with zero error rate using a symmetry-favored10-NN graph.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Illumination
I Using the traditional k-NN graph, LGC and GFHF causemany errors. But GFHF achieves perfect results when usingthe proposed symmetry-favored k-NN graph. This illustratesthat graph quality is critical to SSL, and the same SSLmethod leads to very different results using different graphconstruction schemes.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
k-NN Graph
I Let us define an asymmetric n × n matrix:
Aij =
{exp
(−d(xi ,xj )
2
σ2
), if j ∈ Ni
0, otherwise(1)
where the set Ni saves the indexes of k nearest neighbors ofpoint xi and d(xi , xj) is some distance measure (e.g.Euclidean distance) between xi and xj .
I The parameter σ is empirically estimated byσ =
∑ni=1 d(xi , xik )/n, where xik is the k-th nearest neighbor
of xi . Such an estimation is verified simple and sufficientlyeffective.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
k-NN sGraph
I Let us define a symmetric n × n matrix:
Wij =
Aij + Aji , if j ∈ Ni and i ∈ Nj
Aji , if j /∈ Ni and i ∈ Nj
Aij , otherwise(2)
Obviously, W = A + AT and W is symmetric with Wii = 0(to avoid self loops). This weighting scheme favors thesymmetric edges < xi , xj > such that xi is in theneighborhood of xj and xj is simultaneously in theneighborhood of xi .
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Remark1. The weights of those symmetric edges are doubled explicitly dueto the reasonable consideration that two points connected by asymmetric edge are prone to be on the same submanifold.2. In contrast, the weighting scheme adopted by traditional k-NNgraphs treats all edges in the same manner, which defines theweighted adjacency matrix by max{A,AT}.3. We call the graph constructed through eq. (2) thesymmetry-favored k-NN graph or k-NN sGraph in abbreviation.The proposed graph is relatively robust to noise as it reinforces thesimilarities between points on manifolds.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Comparision
2-NN Graph 2-NN sGraph
Figure: Thicker edges represent larger edge weights.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Graph Laplacian
I Given the constructed graph G (V ,E ,W), the smoothsemi-norm used in most graph-based approaches is
‖f ‖2G =
1
2(f (vi )− f (vj))
2Wij = fTLf,
where we elicit the graph Laplacian matrix
L = D−W. (3)
I The degree matrix D ∈ Rn×n is a diagonal matrix such thatDii =
∑nj=1 Wij . Dii approximates the local density of
neighborhood at xi .
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Doubly-Stochastic Matrix
I Theorem 1 (in paper) implies that the smooth normemphasizes neighborhoods of high densities (large Dii ).However, sampling is usually not uniform in practice, soover-emphasizing the neighborhoods of high densities mayocclude the information in sparse regions.
Figure: Ununiform sampling.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Doubly-Stochastic Matrix
I To fully exploit the power of unlabeled data, we wouldn’texpect sparse densities from all unlabeled data. Thus, wechoose to enforce the equal degree constraint Dii = 1 bysetting W1 = 1 which makes the adjacency matrix W adoubly-stochastic matrix.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
How to learn?
I We try to learn W from training data without any presumedfunction form. We only assume that W is close to the initialW0 calculated via eq. (2).
I We can infuse semi-supervised information into W. Considera pair set
T = {(i , j)|i = j or (xi , xj) differ in labels}
and define its matrix form T. In particular, we requireWij = 0 for (i , j) ∈ T or equivalently require
∑(i ,j)∈T Wij = 0
due to Wij ≥ 0. This constraint is intuitive since it removesself loops and erroneous edges.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Learning W
I We formulate learning doubly-stochastic W subject todifferently labeled information T as follows
min G(W) =1
2‖W −W0‖2
F
s.t.∑
(i ,j)∈T
Wij = 0
W1 = 1, W = WT , W ≥ 0 (4)
where ‖.‖F stands for the Frobenius norm. Eq. (4) falls intoan instance of quadratic programming (QP).
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Learning W
I For efficient computation, we divide this QP problem into twoconvex sub-problems
min G(W) =1
2‖W −W0‖2
F
s.t.∑
(i ,j)∈T
Wij = 0, W1 = 1, W = WT (5)
and
minG(W) =1
2‖W −W0‖2
F s.t.W ≥ 0 (6)
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Learning W
I We find a simple solution to the sub-problem eq. (6):W = dW0e≥0 in which the operator dW0e≥0 zeros out allnegative entries of W0. The operator is essentially a conicsubspace projection operator.
I We solve the sub-problem eq. (5)
W = P(W0,T) = W0 −(
t0 +21TTµµµ0
|T |)
T + µµµ01T + 1µµµ0T, (7)
where P(W0,T) behaves as an affine subspace projectionoperator. t0 and µµµ0 are also computed based on W0.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Successive Projection
I We tackle the original QP problem eq. (4) by successiveprojection using the two subspace projection operators.
I Von-Neumanns successive projection lemma: thesuccessively alternate projection process will converge ontothe intersect of the affine and conic subspace operators. VNslemma ensures that alternately solving sub-problems eq. (5)and (6) is theoretically guaranteed to converge to the globallyoptimal solution of the target problem eq. (4).
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Algorithm 1. Doubly-Stochastic Adjacency Matrix LearningINPUT: the initial adjacency matrix W0
the differently labeled information Tthe maximum iteration number MaxIter .
LOOP: m = 1, · · · ,MaxIterWm = P(Wm−1,T)If Wm ≥ 0 stop LOOP;else Wm = dWme≥0.
OUTPUT: W = Wm.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Two Rings Toy Problem
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Figure: Two rings toy data.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Two Rings Toy Problem
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9(a) k−NN Graph
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9(b) b−Matching Graph
Figure: k = 10. The b-matching graph is a regular graph where eachnode has k adjacent nodes.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Two Rings Toy Problem
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9(c) unit−degree Graph
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9(d) unit−degree Graph given two labeled points
Figure: These two graphs have doubly-stochastic matrices learned basedon the 10-NN sGraph. The former doesn’t use the differently labeledinformation T (good enough!), while the latter does.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Merits of Doubly-Stochastic Matrix
I It offers a nonparametric form for W, flexibly representingdata lying in compact clusters or intrinsic low-dimensionalsubmanifolds.
I It is highly robust to noise, e.g., when a noisy sample xj
invades the neighborhood of xi , the unit-degree constraintmakes the weight Wij absolutely small compared to theweights between xi and closer neighbors.
I It provides the “balanced” graph Laplacian with which thesmooth norm penalizes label prediction functions on eachsample (node) uniformly, resulting in uniform labelpropagation.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Goal
I Solve a soft label matrix F ∈ Rn×c for any multi-class SSLtask.
.1 .2 .[ , ,..., ]l
c
u
= =
F
F F F F
F
lY
unknownaccount for each class
known class assignment
Figure: Provided Yl infer Fu.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Multi-Class Constraints
I It suffices to suppose the class posteriors for the labeled databe p(Ck |xi ) = Yik = 1 if xi ∈ Ck and p(Ck |xi ) = Yik = 0otherwise. Importantly, if we knew class priorsωωω = [p(C1), · · · , p(Cc)]
T (ωωωT1c = 1) and regarded soft labelsFik as p(Ck |xi ), we would have the equation
1TF.k
n∼=
n∑
i=1
p(Ck |xi )
n=
n∑
i=1
p(xi )p(Ck |xi ) = p(Ck) (8)
where the marginal probability density p(xi ) ∝ Dii = 1 isassumed to be 1/n. Eq. (8) induces a hard constraint1TF = nωωωT (FT1 = nωωω).
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Multi-Class Label Propagation
I To address multi-class problems, our motivation is to let thesoft labels Fik carry the main properties of p(Ck |xi ). Hence,we impose two hard constraints FT1 = nωωω and F1c = 1 (dueto
∑k p(Ck |xi ) = 1, 1c is a c-dimensional 1-entry vector) to
obtain a constrained multi-class label propagation:
minF tr(FTLF)
s.t. Fl = Yl , F1c = 1, FT1 = nωωω (9)
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Multi-Class Label Propagation
I Eq. (9) reduces to
min Q(Fu) = tr(FTu LuuFu) + 2tr(FT
u LulYl)
s.t. Fu1c = 1u, FTu 1u = nωωω − YT
l 1l (10)
where Luu and Lul are sub-matrices of L =
[Lll Llu
Lul Luu
], and
1l and 1u are l- and u-dimensional 1-entry vectors,respectively.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Multi-Class Label Propagation
I Theorem 2 (in paper) shows a closed-form solution toeq. (10). The formulated multi-class label propagationsucceeds in incorporating class priors, different from allexisting label propagation methods.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Flowchart of RMGT
doubly-stochastic
adjacency matrix
learning
input
feature
vectors
k-NN sGraph unit-degree Graph
multi-class label
propagation
prior labels
global
classification
Figure: The RMGT algorithm.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Experimental Setup
Data #Features #Samples #Classes
USPS (test) 256 2007 10
FRGC (subset) 4608 3160 316
Data #Features #Samples #Classes
USPS (test) 256 2007 10
FRGC (subset) 4608 3160 316
Figure: Digit and face images.
RMGT: without graph adjacency matrix learning.RMGT(W): with graph adjacency matrix learning.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Performance Curves
20 30 40 50 60 70 80 90 1000.1
0.15
0.2
0.25
0.3
0.35
# Labeled Samples
Err
or R
ate
(%)
USPS
LGCSGTGFHF+CMNRMGTRMGT(W)
3 4 5 6 7 8 9 10
0.65
0.7
0.75
0.8
0.85
# Labeled Samples/100
Re
cog
nit
ion
Ra
te (
%)
FRGC
LGC
SGT
GFHF+CMN
RMGT
RMGT(W)
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Conclusions
I All compared SSL algorithms achieve performance gains whenswitching k-NN graphs to k-NN sGraphs.
I RMGT performs better than the other methods, thusdemonstrating the success of multi-class label propagationwith class priors.
I RMGT(W) is significantly superior to the others, manifestingthat the proposed graph learning technique (doubly-stochasticadjacency matrix learning) boosts graph-based SSLperformance.
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs
OutlineIntroduction
Graph ConstructionGraph Learning
Robust Multi-Class Graph Transduction (RMGT)Experiments
Thanks!For any problems, please email to [email protected].
Wei Liu and Shih-Fu Chang Robust Multi-Class Transductive Learning with Graphs