20
1 Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 7 Component Analysis for PR & HS Computer Vision & Image Processing Structure from motion. Spectral graph methods for segmentation. Appearance and shape models. Fundamental matrix estimation and calibration. – Compression. – Classification. Dimensionality reduction and visualization. Signal Processing Spectral estimation, system identification (e.g. Kalman filter), sensor array processing (e.g. cocktail problem, eco cancellation), blind source separation, Computer Graphics Compression (BRDF), synthesis, Speech, bioinformatics, combinatorial problems. Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 8 Computer Vision & Image Processing Structure from motion. Spectral graph methods for segmentation. Appearance and shape models. Fundamental matrix estimation and calibration. – Compression. – Classification. Dimensionality reduction and visualization. Signal Processing Spectral estimation, system identification (e.g. Kalman filter), sensor array processing (e.g. cocktail problem, eco cancellation), blind source separation, Computer Graphics Compression (BRDF), synthesis, Speech, bioinformatics, combinatorial problems. Structure from motion Component Analysis for PR & HS Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 9 Computer Vision & Image Processing Structure from motion. Spectral graph methods for segmentation. Appearance and shape models. Fundamental matrix estimation and calibration. – Compression. – Classification. Dimensionality reduction and visualization. Signal Processing Spectral estimation, system identification (e.g. Kalman filter), sensor array processing (e.g. cocktail problem, eco cancellation), blind source separation, Computer Graphics Compression (BRDF), synthesis, Speech, bioinformatics, combinatorial problems. Spectral graph methods for segmentation. Component Analysis for PR & HS Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 10 Computer Vision & Image Processing Structure from motion. Spectral graph methods for segmentation. Appearance and shape models. Fundamental matrix estimation and calibration. – Compression. – Classification. Dimensionality reduction and visualization. Signal Processing Spectral estimation, system identification (e.g. Kalman filter), sensor array processing (e.g. cocktail problem, eco cancellation), blind source separation, Computer Graphics Compression (BRDF), synthesis, Speech, bioinformatics, combinatorial problems. Appearance and shape models Component Analysis for PR & HS

Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

1

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 7

Component Analysis for PR & HS

• Computer Vision & Image Processing– Structure from motion.

– Spectral graph methods for segmentation.

– Appearance and shape models.

– Fundamental matrix estimation and calibration.

– Compression.

– Classification.

– Dimensionality reduction and visualization.

• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor

array processing (e.g. cocktail problem, eco cancellation), blind source separation, -

• Computer Graphics– Compression (BRDF), synthesis,-

• Speech, bioinformatics, combinatorial problems.

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 8

• Computer Vision & Image Processing– Structure from motion.

– Spectral graph methods for segmentation.

– Appearance and shape models.

– Fundamental matrix estimation and calibration.

– Compression.

– Classification.

– Dimensionality reduction and visualization.

• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor

array processing (e.g. cocktail problem, eco cancellation), blind source separation, -

• Computer Graphics– Compression (BRDF), synthesis,-

• Speech, bioinformatics, combinatorial problems.

Structure from motion

Component Analysis for PR & HS

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 9

• Computer Vision & Image Processing– Structure from motion.

– Spectral graph methods for segmentation.

– Appearance and shape models.

– Fundamental matrix estimation and calibration.

– Compression.

– Classification.

– Dimensionality reduction and visualization.

• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor

array processing (e.g. cocktail problem, eco cancellation), blind source separation, -

• Computer Graphics– Compression (BRDF), synthesis,-

• Speech, bioinformatics, combinatorial problems.

Spectral graph methods for segmentation.

Component Analysis for PR & HS

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 10

• Computer Vision & Image Processing– Structure from motion.

– Spectral graph methods for segmentation.

– Appearance and shape models.

– Fundamental matrix estimation and calibration.

– Compression.

– Classification.

– Dimensionality reduction and visualization.

• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor

array processing (e.g. cocktail problem, eco cancellation), blind source separation, -

• Computer Graphics– Compression (BRDF), synthesis,-

• Speech, bioinformatics, combinatorial problems.

Appearance and shape models

Component Analysis for PR & HS

Page 2: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

2

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 11

• Computer Vision & Image Processing– Structure from motion.

– Spectral graph methods for segmentation.

– Appearance and shape models.

– Fundamental matrix estimation and calibration.

– Compression.

– Classification.

– Dimensionality reduction and visualization.

• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor

array processing (e.g. cocktail problem, eco cancellation), blind source separation, -

• Computer Graphics– Compression (BRDF), synthesis,-

• Speech, bioinformatics, combinatorial problems.

Dimensionality reduction and visualization

Component Analysis for PR & HS

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 12

• Computer Vision & Image Processing– Structure from motion.

– Spectral graph methods for segmentation.

– Appearance and shape models.

– Fundamental matrix estimation and calibration.

– Compression.

– Classification.

– Dimensionality reduction and visualization.

• Signal Processing– Spectral estimation, system identification (e.g. Kalman filter), sensor

array processing (e.g. cocktail problem, eco cancellation), blind source separation, -

• Computer Graphics– Compression (BRDF), synthesis,-

• Speech, bioinformatics, combinatorial problems.

cocktail problem

Component Analysis for PR & HS

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 13

Independent Component Analysis (ICA)

Sound

Source 1

Sound

Source 2

Mixture 1

Mixture 2

Output 1

Output 2

I

C

A

Sound

Source 3

Mixture 3

Output 3

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 14

Why CA for PR & HS?

• Learn from high dimensional data and few samples.– Useful for dimensionality reduction specially when functions are

smooth.

features samples

• Efficient methods O( d n< <n2 )

(Everitt,1984)

• Easy to formulate, to solve and to extend

– Non-linearities (Kernel methods) (Scholkopf & Smola,2002; Shawe-Taylor &

Cristianini,2004)

– Probabilistic (latent variable models)

– Multi-factorial (tensors) (Paatero & Tapper, 1994 ;O’Leary & Peleg,1983; Vasilescu & Terzopoulos,2002; Vasilescu & Terzopoulos,2003)

– Exponential family PCA (Gordon,2002; Collins et al. 01)

• Natural geometric interpretation

Page 3: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

3

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 15

Are CA methods popular/useful/used?

• Still work to do

– Results 1 - 10 of about 83,000 for “Spanish crisis"

– Results 1 - 10 of about 287,000,000 for "Britney Spears"

• About 28% of CVPR-07 papers use CA.

• Google:– Results 1 - 10 of about 1,870,000 for "principal component

analysis".

– Results 1 - 10 of about 506,000 for "independent componentanalysis"

– Results 1 - 10 of about 273,000 for "linear discriminantanalysis"

– Results 1 - 10 of about 46,100 for "negative matrix

factorization"

– Results 1 - 10 of about 491,000 for "kernel methods"

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 16

Outline

• Introduction (15 min)

• Generative models (40 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)

• Discriminative models (40 min)– (LDA, SVM, OCA, CCA)

• Standard extensions of linear models (30 min)– (Kernel methods, Latent variable models, Tensor

factorization )

• Unified view (20 min)

Generative models (40 min)(PCA, k-means, spectral clustering, NMF, ICA, MDS)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 17

Generative models

• Principal Component Analysis/Singular Value Decomposition

• Non-Negative Matrix Factorization

• Independent Component Analysis

• K-means and spectral clustering

• Multi-dimensional Scaling

BCD ≈

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 18

Principal Component Analysis (PCA)

• PCA finds the directions of maximum variation of the data

• PCA decorrelates the original variables

(Pearson, 1901; Hotelling, 1933;Mardia et al., 1979; Jolliffe, 1986; Diamantaras, 1996)

+*+=+

-*- =+

9.075.0

75.08.0

1x

2x

1.10

01.2+*+=+

-*- =+

+*-=-+*-=-

Page 4: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

4

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 19

PCA

kccc ++++≈ ......21µ

kbbb

21

[ ] T

nn µ1BCdddD +≈= ...21

d=pixels

n= images

1××

××

ℜ∈ℜ∈

ℜ∈ℜ∈dnk

kdnd

µC

BD

•Assuming zero mean data, the basis B that preserve themaximum variation of the signal is given by the eigenvectorsof DDT.

B ΛBDD =Td

d=pixels

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 20

Snap-shot method & SVD• If d>>n (e.g., images 100*100 vs. 300 samples) no DDT.

• DDT and DTD have the same eigenvalues (energy) and

related eigenvectors (by D).

• B is a linear combination of the data!

• [α,L]=eig(DTD) B=D α(diag(diag(L))) -0.5

ΛDαDDαDDDBΛBDD TTTT ==

SVDPCA

diagonal

nnnkkd

T

ΣIVVIUU

VΣU

VUΣD

TT ==

ℜ∈ℜ∈ℜ∈

=×××

• SVD factorizes the data matrix D as:

Λ==

ℜ∈ℜ∈

=××

TT CCIBB

CB

BCD

nkkd

TT UUΛDD =

TT VVΛDD =

(Beltrami, 1873; Schmidt, 1907; Golub & Loan, 1989)

(Sirovich, 1987)DαB =

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 21

Error function for PCA

(Eckardt & Young, 1936; Gabriel & Zamir, 1979; Baldi & Hornik, 1989; Shum et al.,

1995; De la Torre & Black, 2003a)

• Not unique solution: kk×− ℜ∈= RBCCBRR 1 (De la Torre 2012)

F

n

i

iiE BCDBcdCB, −=−=∑=1

2

21 )(

• PCA minimizes the following function:

(Baldi & Hornik, 1989)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 22

PCA/SVD in Computer Vision• PCA/SVD has been applied to:

– Recognition (eigenfaces:Turk & Pentland, 1991; Sirovich & Kirby, 1987; Leonardis & Bischof, 2000; Gong et al., 2000; McKenna et al., 1997a)

– Parameterized motion models (Yacoob & Black, 1999; Black et al., 2000; Black, 1999; Black & Jepson, 1998)

– Appearance/shape models (Cootes & Taylor, 2001; Cootes et al., 1998; Pentland et al., 1994; Jones & Poggio, 1998; Casia & Sclaroff, 1999; Black & Jepson, 1998; Blanz & Vetter, 1999; Cootes et al., 1995; McKenna et al., 1997; de la Torre et al., 1998b; de la Torre et al., 1998b)

– Dynamic appearance models (Soatto et al., 2001; Rao, 1997; Orriols & Binefa, 2001; Gong et al., 2000)

– Structure from Motion (Tomasi & Kanade, 1992; Bregler et al., 2000; Sturm & Triggs, 1996; Brand, 2001)

– Illumination based reconstruction (Hayakawa, 1994)

– Visual servoing (Murase & Nayar, 1995; Murase & Nayar, 1994)

– Visual correspondence (Zhang et al., 1995; Jones & Malik, 1992)

– Camera motion estimation (Hartley, 1992; Hartley & Zisserman, 2000)

– Image watermarking (Liu & Tan, 2000)

– Signal processing (Moonen & de Moor, 1995)

– Neural approaches (Oja, 1982; Sanger, 1989; Xu, 1993)

– Bilinear models (Tenenbaum & Freeman, 2000; Marimont & Wandell, 1992)

– Direct extensions (Welling et al., 2003; Penev & Atick, 1996)

Page 5: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

5

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 23

Generative models

• Principal Component Analysis/Singular Value Decomposition

• Non-Negative Matrix Factorization

• Independent Component Analysis

• K-means and spectral clustering

• Multi-dimensional Scaling

BCD ≈

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 24

“Intercorrelations among

variables are the bane of the

multivariate researcher’s struggle

for meaning”Cooley and Lohnes, 1971

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 25

Part-based representation

�The firing rates of neurons are never negative

� Independent representations

NMF & ICA

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 26

Non-negative Matrix Factorization (NMF)

• Positive factorization.

• Leads to part-based representation.

0||||)( ≥−= CB,BCDCB, FE

(Lee & Seung, 1999)

Page 6: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

6

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

NMF

• Multiplicative algorithm can be interpreted as

diagonally rescaled gradient descent

∑ −=≥≥

ij

ijijdF2

0,0)(min BC

CB

ij

ij

ijij)(

)(

BVB

DBCC

T

T

Inference:

Learning:

ij

T

ij

T

ijij)(

)(

BCC

DCBB ←

Derivatives:

ijij

ij

F)()( CBBCB

C

TT −=∂∂

ij

T

ij

T

ij

F)()( DCBCC

B−=

∂∂

(Lee & Seung, 1999;Lee & Seung, 2000)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 28

Independent Component Analysis (ICA)

• We need more than second order statistics to represent

the signal

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 29

ICA

• Look for si that are independent

• PCA finds uncorrelated variables

• For Gaussian distributions independence and

uncorrelated is the same

• Uncorrelated E(sisj)= E(si)E(sj)

• Independent E(g(si)f(sj))= E(g(si))E(f(sj)) for any non-

linear f,g

1−≈=≈= BWWDSCBCD

(Hyvrinen et al., 2001)

PCA ICA

S=(1,0)

S=(0,1)

S=(0.5, 0.5)

S=(-0.5, -0.5)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 30

ICA vs PCA(Hyvrinen et al., 2001)

Page 7: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

7

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 31

Many optimization criteria

• Minimize high order moments: e.g. kurtosis

kurt(W) = E{s4} -3(E{s2}) 2

• Many other information criteria.

∑∑==

+−n

i

i

n

i

ii S11

)(cBcdSparseness (e.g. S=| |)

(Olhausen & Field, 1996)

(Chennubhotla & Jepson, 2001b; Zou et al., 2005; dAspremont et al., 2004;)

• Also an error function:

• Other sparse PCA.

WDS =

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 32

Basis of natural images

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 33

Denoising

Original

image Noisy Image

(30% noise)

Denoise

(Wiener filter) ICA

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 34

Generative models

• Principal Component Analysis/Singular Value Decomposition

• Non-Negative Matrix Factorization

• Independent Component Analysis

• K-means and spectral clustering

• Multi-dimensional Scaling

BCD ≈

Page 8: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

8

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

The clustering problem

• Partition the data set in c-disjoint “clusters” of data points.

12

1

104

21)1(

1),( ≈

=

=

−= ∑

= c

ni

i

c

ccnS n

c

i

c

• Number of possible partitions

• NP-hard and approximate algorithms (k-means, hierarchical

clustering, mog, -)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

K-means

F

TE ||||),(0 MGDGM −=(Ding et al., ‘02, Torre et al ‘06)

=TMGxy

1

2

3

4

5

6

7

8

9

10

x

y

=TG

xy=D

=Mxy

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Spectral Clustering

37

Affinity Matrix

(Dhillon et al., ‘04, Zass & Shashua, 2005; Ding et al., 2005, De la Torre et al ‘06)

Eigenvectors

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 38

Generative models

• Principal Component Analysis/Singular Value Decomposition

• Non-Negative Matrix Factorization

• Independent Component Analysis

• K-means and spectral clustering

• Multi-dimensional Scaling

BCD ≈

Page 9: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

9

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Multi-Dimensional Scaling (MDS)

• MDS takes a matrix of pair-wise distances and finds

an embedding that preserves the inter-point

distances.

39

Pair-wise distances

of US cities

Spatial layout of cities

in an embedded space

North/South

East/West

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

∑ ∑ −i j

ijij dfn

2

},,{

])([min1

δyy K

Pair-wise distances

in original data

space (Given)

Pair-wise distances in

the embedded space

(Unknown)

2

2jiijd yy −=

MDS (II)

Original data space Embedded space

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 41

MDS (III)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 42

Outline

• Introduction (15 min)

• Generative models (40 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)

• Discriminative models (40 min)– (LDA, SVM, OCA, CCA)

• Standard extensions of linear models (30 min)– (Kernel methods, Latent variable models, Tensor

factorization )

• Unified view (20 min)

Page 10: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

10

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 43

Discriminative models

• Linear Discriminant Analysis (LDA)

• Support Vector Machines (SVM)

• Oriented Component Analysis (OCA)

• Canonical Correlation Analysis (CCA)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 44

Linear Discriminant Analysis (LDA)

• Optimal linear dimensionality reduction if classes are

Gaussian with equal covariance matrix.

BΛSBSBSB

BSBB b

bt

t

T

T

J ==||

||)(

∑∑= =

−−=C

i

C

j

T

jijib

1 1

))(( µµµµS

(Fisher, 1938;Mardia et al., 1979; Bishop, 1995)

∑=

==n

i

T

ii

T

t

1

ddDDS

T

ji

c

j

C

i

jiw

i

)()(1 1

µdµdS −−=∑∑= =

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Support Vector Machines (SVM)

• Linear classifier

45

denotes +1

denotes -1

How would you

classify this data?

• Infinite amount lines classify the data well, but which is

the best?

)()( bsignf T += xwx

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Large margin linear classifier

“safe zone”

• The linear discriminant

function with the maximum

margin is the best

Margin

x1

x2

denotes +1

denotes -1

• Why is the best?

– Robust to outliers and

shown to improve

generalization

• The margin with is:

2

2

2

w=+= +− ddM

Support

vectors

−d

−d

Page 11: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

11

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Large margin linear classifier

“safe zone”

• The linear discriminant

function with the maximum

margin is the best

Margin

x1

x2

denotes +1

denotes -1

• Why is the best?

– Robust to outliers and

shown to improve

generalization

• The margin with is:

2

2

2

w=M

Support

vectors

11

11

..min2

2

−≤+−=

≥++=

byfor

byfor

ts

i

T

i

i

T

i

xw

xw

w

Quadratic optimization problem

with linear constraints

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

SVM formulation

• But what if we have error or non-linear decision boundaries.

x1

x2

• Slack variables ξi can be added

To allow misclassification of difficult

or noisy data points

0

11

11

..min1

2

2

+−≤+−=

−≥++=

+ ∑=

i

ii

T

i

ii

T

i

n

i

i

by

by

tsC

ξ

ξ

ξ

ξ

xw

xw

w

Balance trade-off between margin and

classification error. Controls over-fitting

1ξ2ξ

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

SVM is a regularized network

• In the noiseless case, LDA on the support vectors is

equivalent to SVM (Shashua 99)

• SVM classifier can be also optimized in the primal

without constraints:

49

∑=

+−+N

i

T

ii xyC1

0

2

2))(1,0max(min wxw

Regularization Training error with hinge-loss function

0 1-1

y=-1y=+1

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Other classifiers

• Several other loss-functions for other classifiers (e.g.,

logistic regression, Adaboost)

50

0 1

Hinge-loss 0/1 loss

Penalize examples that are well classified but close to the margin

Logistic loss )1log()( ii fy

ex+

Page 12: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

12

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 51

Discriminative Models

• Linear Discriminant Analysis (LDA)

• Support Vector Machines (SVM)

• Oriented Component Analysis (OCA)

• Canonical Correlation Analysis (CCA)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 52

Oriented Component Analysis (OCA)

• Generalized eigenvalue problem:

• boca is steered by the distribution of noise.

OCA

T

OCA

T

OCA

noiseOCA

signal

bΣb

bΣb

λkeki bΣbΣ =

signal

noise

OCAb

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 53

OCA for face recognition

k

T

k

T

k

ek

i

bΣb

bΣb1

1

j

T

j

j

T

j

e

i

bΣb

bΣb2

2

(De la Torre et al. 2005)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 54

Canonical Correlation Analysis (CCA)

• Perform PCA independently and learn a mapping

PCA PCA

• Independent dimensionality reduction between set can loose signals with small energy but highly correlated

Page 13: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

13

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 55

Canonical Correlation Analysis (CCA)

• Learn relations between multiple data sets? (e.g. find

features in one set related to another data set)

• Given two sets , CCA finds the pair

of directions wx and wy that maximize the correlation

between the projections (assume zero mean data)

T

y

TT

y

T

x

TT

x

y

TT

x

YwYwXwXw

YwXw=ρ

ndnd and ×× ℜ∈ℜ∈ 21 YX

=ℜ∈

=ℜ∈

= +×++×+

y

xdddd

T

Tdddd

T

T

w

ww

YY0

0XXΒ

0YX

YX0A

)()()()( 21212121 ,

ΒwAw λ=

(Mardia et al., 1979; Borga 98)

• Several ways of optimizing it:

• An stationary point of r is the solution to CCA.

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 56

Virtual avatars with CCA(De la Torre & Black 2001)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 57

Outline

• Introduction (15 min)

• Generative models (40 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)

• Discriminative models (40 min)– (LDA, SVM, OCA, CCA)

• Standard extensions of linear models (30 min)– (Kernel methods, Latent variable models, Tensor

factorization )

• Unified view (20 min)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Linear methods fail

58

• Data points in a non-linear manifold

• There is no good linear mapping to map to a plane

• Linear methods only rotate/translate/scale the data

Page 14: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

14

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 59

Linear methods fail • Learning a non-linear representation for classification

Cos close to 0

Cos close to -1

Cos close to 1

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Linear methods not enough

• Linear methods:– Unique optimal solutions

– Fast learning algorithms

– Better statistical analysis

• Problem:– Insufficient capacity. Minsky and Pappert pointed out in their books

Perceptrons

– Neural networks adding non-linear layers (e.g., MLP). Solve the

capacity problem but hard to train and local minima.

60

• Kernel methods:– Use linear techniques but work in a high-dimensional space.

)(xx Φ→

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 61

Kernel methods

• The kernel defines an implicit mapping (usually high dimensional)

from the input to feature space, so the data becomes linearly

separable.

• Computation in the feature space can be costly because it is

(usually) high dimensional

– The feature space can be infinite-dimensional!

Feature spaceInput space

),,(),,(),(321

22

212121

zzzxxxxxx =+→

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 62

Kernel methods (II)• Suppose φ(.) is given as follows

• An inner product in the feature space is

• So, if we define the kernel function as follows, there is no need to carry out φ(.) explicitly

• This use of kernel function to avoid carrying out φ(.) explicitly is known as the kernel trick. In any linear algorithm that can be expressed by inner products can be made nonlinear by going to the feature space

Page 15: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

15

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 63

Kernel PCA(Scholkopf et al., 1998)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 64

Kernel PCA (II)

• Eigenvectors of the cov. Matrix in feature space.

• Eigenvectors lie in the span of data in feature space.

∑=

ΦΦ=n

i

iin 1

T)()(1

ddC λ11 bbC =

∑=

Φ=n

i

ii

1

1 )(db α

λαKα =

(Scholkopf et al., 1998)

λαα ])([),()(1

1 11

∑ ∑∑= ==

Φ=Φn

i

i

n

i

ijii

n

j

i Kn

dddd

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 65

Outline• Introduction (5 min)

• Generative models (20 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)

• Discriminative models (20 min)– (LDA, SVM, OCA, CCA)

• Standard extensions of linear models (15 min)– (Kernel methods, Latent variable models, Tensor

factorization )

• Unified view (15 min)

• Extended generative models (50 min)– RPCA, PaCA, ACA

• Extended discriminative models (1 hour)– MODA, Parda, CTW, seg-SVM

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 66

Factor Analysis• A Gaussian distribution on the coefficients and noise is

added to PCA� Factor Analysis.

• Inference (Roweis & Ghahramani, 1999;Tipping & Bishop, 1999a)

Ψ+=−−=

=ΨΨ=

Ψ+==

++=

TT

d

k

E

diagNp

NpNp

BBµdµdd

0,cη

BcµdBcdI0,cc

ηBcµd

)))((()cov(

),...,,()|()(

),|(),|()|()(

21 ηηη

(Mardia et al., 1979)

11

1

)(

)()(

),|()(

−−

Ψ+=

−Ψ+=

=

BBIV

µdBBBm

Vmcd|c

T

TT

Np

PCA reconstruction low error.

FA high reconstruction error (low likelihood).

),( dcp Jointly Gaussian

Page 16: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

16

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 67

Ppca• If PPCA.

• If is equivalent to PCA. TTTTBBBBBB

11 )()(0 −− =Ψ+→ε

d

TE Iηη ε==Ψ )(

0→ε

• Probabilistic visual learning (Moghaddam & Pentland, 1997;)

=

Σ

=

Σ

== −

=

−−+−−−Σ−−

∏∫

=

−−

2

)(

2

)(

1

21

2

2

1

2

12

)()()(2

1

2

12

)()(2

1

)2()2()2()2(

)()()(

2

1

2

11

kdk

i

i

d

c

dd

eeeedppp

k

i i

iTT

πρλπππ

ρε

λεd

µdIBBµdµdµd T

ccc|dd

i

T

i dBc =

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 68

More on PPCA

• Tracking (Yang et al., 1999; Yang et al., 2000a; Lee et al., 2005; de la Torre et

al., 2000b)

• Recognition/Detection (Moghaddam et al., 2000; Shakhnarovich &

Moghaddam, 2004; Everingham & Zisserman, 2006)

• PCA for the exponential family (collins et al., 2001)

(Tipping & Bishop, 1999b; Black et al., 1998; Jebara et al., 1998)

• Extension to mixtures of Ppca (mixture of subspaces).

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 69

Outline• Introduction (5 min)

• Generative models (20 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)

• Discriminative models (20 min)– (LDA, SVM, OCA, CCA)

• Standard extensions of linear models (15 min)– (Kernel methods, Latent variable models, Tensor

factorization )

• Unified view (15 min)

• Extended generative models (50 min)– RPCA, PaCA, ACA

• Extended discriminative models (1 hour)– MODA, Parda, CTW, seg-SVM

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 70

Tensor faces(Vasilescu & Terzopoulos, 2002; Vasilescu & Terzopoulos, 2003)

viewsilluminations

expressions

people

Page 17: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

17

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 71

Eigenfaces• Facial images (identity change)

• Eigenfaces bases vectors capture the variability in facial

appearance (do not decouple pose, illumination, -)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 72

Data Organization• Linear/PCA: Data Matrix

– Rpixels x images

– a matrix of image vectors

• Multilinear: Data Tensor

– Rpeople x views x illums x express x pixels

– N-dimensional matrix

– 28 people, 45 images/person

– 5 views, 3 illuminations,

3 expressions per person

exilvpp ,,,iIllu

min

ati

on

s

Views

D

D

Pix

els

ImagesD

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 73

N-Mode SVD Algorithm

N = 3

pixelsxexpressx

illums.x

views x

peoplex 51 UUUUU .... ZZZZDDDD 432=

∑∑∑=l p s

lpslpsijk uuuzd

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 74

PCA:

TensorFaces:

Page 18: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

18

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 75

TensorFaces

Mean Sq. Err. = 409.15

3 illum + 11 people param.

33 basis vectors

PCA

Mean Sq. Err. = 85.75

33 parameters

33 basis vectors

Strategic Data Compression =

Perceptual Quality

Original

176 basis vectors

TensorFaces

6 illum + 11 people param.

66 basis vectors

• TensorFaces data reduction in illumination space primarily

degrades illumination effects (cast shadows, highlights)

• PCA has lower mean square error but higher perceptual error

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR 76

Outline

• Introduction (15 min)

• Generative models (40 min)– (PCA, k-means, spectral clustering, NMF, ICA, MDS)

• Discriminative models (40 min)– (LDA, SVM, OCA, CCA)

• Standard extensions of linear models (30 min)– (Kernel methods, Latent variable models, Tensor

factorization )

• Unified view (20 min)

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

The fundamental equation of CA(De la Torre & Kanade 06, De la Torre 2012)

Fc

T

rE ||)(||),(0 WBAWBA Ψ−Γ=

Weights

for rows

Weights

for columns

Regression

matrices

Given two datasets : nxnd and ×× ℜ∈ℜ∈ XD

XD

AB

)(θ)(ϕ

CC

77 Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Properties of the cost function

• E0(A,B) has a unique global minimum (Baldi and Hornik-89).

• Closed form solutions for A and B are:

( ))()()( 2212

0 AWWWAAWAA T

r

TTTT

ccctrE ΨΓΓΨΨΨ= −

( )))(()()( 22122212

0 BWWWWWBBWBB r

T

c

T

c

T

cr

T

r

TtrE ΓΨΨΨΨΓ= −−

78

Page 19: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

19

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Principal Component Analysis (PCA)

• PCA finds the directionsof maximum variation ofthe data based on linearcorrelation.

(Pearson, 1901; Hotelling, 1933;Mardia et al., 1979; Jolliffe, 1986; Diamantaras, 1996)

• Kernel PCA finds thedirections of maximumvariation of the data inthe feature space.

),,(),,(),(321

22

212121

zzzxxxxxx =+→

79 Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

PCA-Kernel PCA

• The primal problem:

Fc

T

rE ||)(||),(0 WBAWBA Ψ−Γ=

F

T

kpcaE ||)(||),( BADBA −=)(Dϕ

( )))()(()()( 1ADDAAAA ϕϕ TTT

kpca trE −=

( ))()()( 1BDDBBBB

TTT

kpca trE −=

• Error function for KPCA: (Eckardt & Young, 1936; Gabriel & Zamir, 1979; Baldi

& Hornik, 1989; Shum et al., 1995; de la Torre & Black, 2003a)

• The dual problem:

80

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Error function for LDA

F

TTT

LDAE ||)()(||),( 2

1

DBAGGGBA −=−

[d1 d2 ... dn]

Equations n×c Unknowns d×c

[ ]A

d=pixels

K=dim

subspace

kn

nk

ijg

×ℜ∈

=

G

1G1

}1,0{

=

0...0

1...0

0...1TG

c=classes

n=samples

• d>>n an UNDETERMINED system of equations! (over-fitting)

(de la Torre & Kanade, 2006)

81 Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

• CCA is the same as LDA changing the label matrix by a

new set X

Fc

T

rE ||)(||),(0 WBAWBA Ψ−Γ=

F

TT

CCAE ||)()(||),( 2

1

XBADDDBA −=−

Canonical Correlation Analysis (CCA)

=

0...0

1...0

0...1TG

c=classes

n=samples

82

Page 20: Component Analysis for PR & HSBCC(DC) B = − ∂ ∂ (Lee & Seung, 1999;Lee & Seung, 2000) Machine Perceptionof HumanBehaviorwith CA F. De la Torre/J. Cohn PAVIS schoolonCV and PR

20

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

K-means

Fc

T

rE ||)(||),(0 WBADWBA Ψ−=(Ding et al., ‘02, Torre et al ‘06)

=TBA

xy

1

2

3

4

5

6

7

8

9

10

x

y

=TA

xy=D

=Bxy

83 Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Normalized cuts

Fc

T

rE ||)(||),(0 WBAΓWBA Ψ−=

)(DΓ ϕ=)](...)()([ 21 ndddΓ ϕϕϕ=

(Dhillon et al., ‘04, Zass & Shashua, 2005; Ding et al., 2005, De la Torre et al ‘06)

Affinity Matrix

Normalized Cuts (Shi & Malik ’00)

Ratio-cuts(Hagen & Kahng ’02)

84

Machine Perception of Human Behavior with CA F. De la Torre/J. Cohn PAVIS school on CV and PR

Other Connections

• The LS-KRRR (E0) is also the generative model for:

– Laplacian Eigenmaps, Locality Preserving projections, MDS,

Partial least-squares, -.

• Benefits of LS framework:

– Common framework to understand difference and communalities

between different CA methods (e.g. KPCA, KLDA, KCCA, Ncuts)

– Better understanding of normalization factors and

generalizations

– Efficient numerical optimization less than θ(n3) or θ(d3), where n

is number of samples and d dimensions

85