Mind the class weight bias: weighted maximum mean ...valser.org/webinar/slide/slides/20170621/weighted MMD-6-21 .pdf · 6/21/2017 · 1. Replace MMD with weighted MMD item in DAN[4]:

Mind the class weight bias: weighted maximum mean discrepancy for unsupervised domain adaptation

Hongliang Yan 2017/06/21

Domain Adaptation

[1]https://cs.stanford.edu/~jhoffman/domainadapt/

Figure 1. Illustration of dataset bias.

Problem: Training (Source）

Test(Target)

Training and test sets are related but

under different distributions.

• Learn feature space that combine

discriminativeness and domain invariance.

Methodology:

source error domain discrepancyminimize +

DA

Maximum Mean Discrepancy (MMD)

• An empirical estimate

2 2

x ~ x ~|| || 1

MMD ( , ) sup || [ (x )] [ (x )] ||s t

s t

s ts t E E

H

H

2 2

1 1

1 1MMD ( , ) || (x ) (x ) ||

M ts t

s t i j

i jM N

D D H

• representing distances between distributions as distances between mean embeddings of features

Motivation• Class weight bias cross domains remains unsolved but ubiquitous

2 2

1 1

1 1MMD ( , ) || (x ) (x ) ||

M ts t

s t i j

i jM N

D D H


2 2

1 1

1 1MMD ( , ) || (x ) (x ) ||

M ts t

s t i j

i jM N

D D H

and s t

c c c cw M M w N N

2

1 1

|| ( (x )) ( (x )) ||C C

s s t t

c c c c

c c

w E w E

H


2 2

1 1

1 1MMD ( , ) || (x ) (x ) ||

M ts t

s t i j

i jM N

D D H

and s t

c c c cw M M w N N

2

1 1

|| ( (x )) ( (x )) ||C C

s s t t

c c c c

c c

w E w E

H

①Changes in sample selection criteria

Effect of class weight bias should be removed:


2 2

1 1

1 1MMD ( , ) || (x ) (x ) ||

M ts t

s t i j

i jM N

D D H

and s t

c c c cw M M w N N

2

1 1

|| ( (x )) ( (x )) ||C C

s s t t

c c c c

c c

w E w E

H



Figure 2. Class prior distribution of three digit recognition datasets.


2 2

1 1

1 1MMD ( , ) || (x ) (x ) ||

M ts t

s t i j

i jM N

D D H

and s t

c c c cw M M w N N

2

1 1

|| ( (x )) ( (x )) ||C C

s s t t

c c c c

c c

w E w E

H



② Applications are not concerned with class prior distribution


2 2

1 1

1 1MMD ( , ) || (x ) (x ) ||

M ts t

s t i j

i jM N

D D H

and s t

c c c cw M M w N N

2

1 1

|| ( (x )) ( (x )) ||C C

s s t t

c c c c

c c

w E w E

H



② Applications are not concerned with class prior distribution

MMD can be minimized by either learning domain invariant representation or preserving the class weights in source domain.

Weighted MMD

Main idea: reweighting classes in source domain so that they have the same class weights as target domain

2 2

1 1

1 1MMD ( , ) || (x ) (x ) ||

M ts t

s t i j

i jM N

D D H

t s

c c cw w

• Introducing an auxiliary weight for each class c in source domainc

2

1 1

|| ( (x )) ( (x )) ||C C

s s t t

c c c c

c c

w E w E

H

Weighted MMD

2 2

1 1

1 1MMD ( , ) || (x ) (x ) ||s

i

M ts t

w s t i j

i jyM N

D D H

2

1 1

|| ( (x )) ( (x )) ||C C

s t t

c c c

c c

t

cE ww E

H

Main idea: reweighting classes in source domain so that they have the same class weights as target domain

2 2

1 1

1 1MMD ( , ) || (x ) (x ) ||

M ts t

s t i j

i jM N

D D H

t s

c c cw w

• Introducing an auxiliary weight for each class c in source domainc

2

1 1

|| ( (x )) ( (x )) ||C C

s s t t

c c c c

c c

w E w E

H

Weighted DAN

[4] Long M, Cao Y, Wang J. Learning Transferable Features with Deep Adaptation Networks[J]., 2015.

1. Replace MMD with weighted MMD item in DAN[4]:

1

W1 { ,..., }

1min (x , ; W) MMD ( , )

L

Ms s l l

i i l s t

i l l l

yM

D D

Weighted DAN


1. Replace MMD with weighted MMD item in DAN[4]:

1

W1 { ,..., }

1min (x , ; W) MMD ( , )

L

Ms s l l

i i l s t

i l l l

yM

D D

1

,W,

1 { ,..., }

1min (x , ; W) MMD ( , )

L

Ms s l l

i i l w s t

i l l l

yM

D D

Weighted DAN

2. To further exploit the unlabeled data in target domain, empirical risk is considered as semi-supervised model in [5]:

[5] Amini, Massih-Reza, and Patrick Gallinari. "Semi-supervised logistic regression." Proceedings of the 15th European Conference on Artificial Intelligence. IOS Press, 2002.

1. Replace MMD with Weighted MMD item in DAN[4]:

1

W1 { ,..., }

1min (x , ; W) MMD ( , )

L

Ms s l l

i i l s t

i l l l

yM

D D

11

,ˆW,{ } ,

1 1 { ,..., }

1 1ˆmin (x , ; W) (x , ; W) MMD ( , )

Nj j

L

M Ns s t t l l

i i i i l w s ty

i j l l l

y yM N

D D


Optimization: an extension of CEM[6]

• E-step:

[7] Celeux, Gilles, and Gérard Govaert. "A classification EM algorithm for clustering and two stochastic versions." Computational statistics & Data analysis 14.3 (1992): 315-332.

Fixed W, estimating the class posterior probability of target samples:

Parameters to be estimated including three parts, i.e.,

The model is optimized by alternating between three steps :

1ˆW, ,{ }t N

j jy

( | x )t t

j jp y c

( | x ) (x , W)t t t

j j jp y c g


• E-step:


• C-step:

Fixed W, estimating the class posterior probability of target samples:

① Assign the pseudo labels on target domain:② update the auxiliary class-specific weights for source domain:



1ˆW, ,{ }t N

j jy

( | x )t t

j jp y c

( | x ) (x , W)t t t

j j jp y c g

ˆ arg max ( | x )t t t

j j jc

y p y c 1ˆ{ }t N

j jy

ˆ ˆ ˆ where ( )t s t t

c c c c c j

j

w w w y N 1

( )c x1 is an indictor function which equals 1 if x = c, and equals 0 otherwise.



• M-step:

Fixed and , updating W. The problem is reformulated as:



1ˆW, ,{ }t N

j jy

1ˆ{ }t N

j jy

1

,W

1 1 { ,..., }

1min (x , ; W) (x , ; W) MMD ( , )

L

M Ns s t t l l

i i i i l w s t

i j l l l

y yM

D D

The gradient of the three items is computable and W can be optimized by using a mini-batch SGD.

Experimental results

• Comparison with state-of-the-arts

Table 1. Experimental results on office-10+Caltech-10

Experimental results

• Empirical analysis

Figure 3. Performance of various model under different class weight bias.

Figure 4. Visualization of the learned features of DAN and weighted DAN.

Summary

• Introduce class-specific weight into MMD to reduce the effect of class weight bias cross domains.

• Develop WDAN model and optimize it in an CEM framework.

• Weighted MMD can be applied to other scenarios where MMD is used for distribution distance measurement, e.g., image generation

Thanks!

Paper & code are available

Paper: https://arxiv.org/abs/1705.00609

Code: https://github.com/yhldhit/WMMD-Caffe

Documents

Mind the class weight bias: weighted maximum mean ...valser.org/webinar/slide/slides/20170621/weighted MMD-6-21 .pdf · 6/21/2017 · 1. Replace MMD with weighted MMD item in DAN[4]: