Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation
Hongliang Yan 2017/06/21
Domain Adaptation
[1]https://cs.stanford.edu/~jhoffman/domainadapt/
Figure 1. Illustration of dataset bias.
Problem: Training (Source)
Test(Target)
Training and test sets are related but
under different distributions.
• Learn feature space that combine
discriminativeness and domain invariance.
Methodology:
source error domain discrepancyminimize +
DA
Maximum Mean Discrepancy (MMD)
• An empirical estimate
2 2
x ~ x ~|| || 1
MMD ( , ) sup || [ (x )] [ (x )] ||s t
s t
s ts t E E
H
H
2 2
1 1
1 1MMD ( , ) || (x ) (x ) ||
M ts t
s t i j
i jM N
D D H
• representing distances between distributions as distances between mean embeddings of features
Motivation• Class weight bias cross domains remains unsolved but ubiquitous
2 2
1 1
1 1MMD ( , ) || (x ) (x ) ||
M ts t
s t i j
i jM N
D D H
Motivation• Class weight bias cross domains remains unsolved but ubiquitous
2 2
1 1
1 1MMD ( , ) || (x ) (x ) ||
M ts t
s t i j
i jM N
D D H
and s t
c c c cw M M w N N
2
1 1
|| ( (x )) ( (x )) ||C C
s s t t
c c c c
c c
w E w E
H
Motivation• Class weight bias cross domains remains unsolved but ubiquitous
2 2
1 1
1 1MMD ( , ) || (x ) (x ) ||
M ts t
s t i j
i jM N
D D H
and s t
c c c cw M M w N N
2
1 1
|| ( (x )) ( (x )) ||C C
s s t t
c c c c
c c
w E w E
H
①Changes in sample selection criteria
Effect of class weight bias should be removed:
Motivation• Class weight bias cross domains remains unsolved but ubiquitous
2 2
1 1
1 1MMD ( , ) || (x ) (x ) ||
M ts t
s t i j
i jM N
D D H
and s t
c c c cw M M w N N
2
1 1
|| ( (x )) ( (x )) ||C C
s s t t
c c c c
c c
w E w E
H
①Changes in sample selection criteria
Effect of class weight bias should be removed:
Figure 2. Class prior distribution of three digit recognition datasets.
Motivation• Class weight bias cross domains remains unsolved but ubiquitous
2 2
1 1
1 1MMD ( , ) || (x ) (x ) ||
M ts t
s t i j
i jM N
D D H
and s t
c c c cw M M w N N
2
1 1
|| ( (x )) ( (x )) ||C C
s s t t
c c c c
c c
w E w E
H
①Changes in sample selection criteria
Effect of class weight bias should be removed:
② Applications are not concerned with class prior distribution
Motivation• Class weight bias cross domains remains unsolved but ubiquitous
2 2
1 1
1 1MMD ( , ) || (x ) (x ) ||
M ts t
s t i j
i jM N
D D H
and s t
c c c cw M M w N N
2
1 1
|| ( (x )) ( (x )) ||C C
s s t t
c c c c
c c
w E w E
H
①Changes in sample selection criteria
Effect of class weight bias should be removed:
② Applications are not concerned with class prior distribution
MMD can be minimized by either learning domain invariant representation or preserving the class weights in source domain.
Weighted MMD
Main idea: reweighting classes in source domain so that they have the same class weights as target domain
2 2
1 1
1 1MMD ( , ) || (x ) (x ) ||
M ts t
s t i j
i jM N
D D H
t s
c c cw w
• Introducing an auxiliary weight for each class c in source domainc
2
1 1
|| ( (x )) ( (x )) ||C C
s s t t
c c c c
c c
w E w E
H
Weighted MMD
2 2
1 1
1 1MMD ( , ) || (x ) (x ) ||s
i
M ts t
w s t i j
i jyM N
D D H
2
1 1
|| ( (x )) ( (x )) ||C C
s t t
c c c
c c
t
cE ww E
H
Main idea: reweighting classes in source domain so that they have the same class weights as target domain
2 2
1 1
1 1MMD ( , ) || (x ) (x ) ||
M ts t
s t i j
i jM N
D D H
t s
c c cw w
• Introducing an auxiliary weight for each class c in source domainc
2
1 1
|| ( (x )) ( (x )) ||C C
s s t t
c c c c
c c
w E w E
H
Weighted DAN
[4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.
1. Replace MMD with weighted MMD item in DAN[4]:
1
W1 { ,..., }
1min (x , ; W) MMD ( , )
L
Ms s l l
i i l s t
i l l l
yM
D D
Weighted DAN
[4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.
1. Replace MMD with weighted MMD item in DAN[4]:
1
W1 { ,..., }
1min (x , ; W) MMD ( , )
L
Ms s l l
i i l s t
i l l l
yM
D D
1
,W,
1 { ,..., }
1min (x , ; W) MMD ( , )
L
Ms s l l
i i l w s t
i l l l
yM
D D
Weighted DAN
2. To further exploit the unlabeled data in target domain, empirical risk is considered as semi-supervised model in [5]:
[5] Amini, Massih-Reza, and Patrick Gallinari. "Semi-supervised logistic regression." Proceedings of the 15th European Conference on Artificial Intelligence. IOS Press, 2002.
1. Replace MMD with Weighted MMD item in DAN[4]:
1
W1 { ,..., }
1min (x , ; W) MMD ( , )
L
Ms s l l
i i l s t
i l l l
yM
D D
11
,ˆW,{ } ,
1 1 { ,..., }
1 1ˆmin (x , ; W) (x , ; W) MMD ( , )
Nj j
L
M Ns s t t l l
i i i i l w s ty
i j l l l
y yM N
D D
[4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.
Optimization: an extension of CEM[6]
• E-step:
[7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.
Fixed W, estimating the class posterior probability of target samples:
Parameters to be estimated including three parts, i.e.,
The model is optimized by alternating between three steps :
1ˆW, ,{ }t N
j jy
( | x )t t
j jp y c
( | x ) (x , W)t t t
j j jp y c g
Optimization: an extension of CEM[6]
• E-step:
[7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.
• C-step:
Fixed W, estimating the class posterior probability of target samples:
① Assign the pseudo labels on target domain:② update the auxiliary class-specific weights for source domain:
Parameters to be estimated including three parts, i.e.,
The model is optimized by alternating between three steps :
1ˆW, ,{ }t N
j jy
( | x )t t
j jp y c
( | x ) (x , W)t t t
j j jp y c g
ˆ arg max ( | x )t t t
j j jc
y p y c 1ˆ{ }t N
j jy
ˆ ˆ ˆ where ( )t s t t
c c c c c j
j
w w w y N 1
( )c x1 is an indictor function which equals 1 if x = c, and equals 0 otherwise.
Optimization: an extension of CEM[6]
[7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.
• M-step:
Fixed and , updating W. The problem is reformulated as:
Parameters to be estimated including three parts, i.e.,
The model is optimized by alternating between three steps :
1ˆW, ,{ }t N
j jy
1ˆ{ }t N
j jy
1
,W
1 1 { ,..., }
1min (x , ; W) (x , ; W) MMD ( , )
L
M Ns s t t l l
i i i i l w s t
i j l l l
y yM
D D
The gradient of the three items is computable and W can be optimized by using a mini-batch SGD.
Experimental results
• Comparison with state-of-the-arts
Table 1. Experimental results on office-10+Caltech-10
Experimental results
• Empirical analysis
Figure 3. Performance of various model under different class weight bias.
Figure 4. Visualization of the learned features of DAN and weighted DAN.
Summary
• Introduce class-specific weight into MMD to reduce the effect of class weight bias cross domains.
• Develop WDAN model and optimize it in an CEM framework.
• Weighted MMD can be applied to other scenarios where MMD is used for distribution distance measurement, e.g., image generation
Thanks!
Paper & code are available
Paper: https://arxiv.org/abs/1705.00609
Code: https://github.com/yhldhit/WMMD-Caffe