47
Introduction to Machine Learning CMU-10701 20. Independent Component Analysis Barnabás Póczos

Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

Introduction to Machine Learning

CMU-10701 20. Independent Component Analysis

Barnabás Póczos

Page 2: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

2

Contents

ICA model

ICA applications

ICA generalizations

ICA theory

Page 3: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

3

Independent Component Analysis

Page 4: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

4

Goal:

Independent Component Analysis

Page 5: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

5

Observations (Mixtures)

original signals

Model

ICA estimated signals

Independent Component Analysis

Page 6: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

6

We observe

Model

We want

Goal:

Independent Component Analysys

Page 7: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

7

• Perform linear transformations

• Matrix factorization

X U S

X A S

PCA: low rank matrix factorization for compression

ICA: full rank matrix factorization to remove dependency among the rows

=

=

N

N

N

M

M<N

ICA vs PCA, Similarities

Page 8: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

8

PCA: X=US, UTU=I

ICA: X=AS

PCA does compression • M<N

ICA does not do compression • same # of features (M=N)

PCA just removes correlations, not higher order dependence

ICA removes correlations, and higher order dependence

PCA: some components are more important than others (based on eigenvalues)

ICA: components are equally important

ICA vs PCA, Similarities

Page 9: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

9

Note • PCA vectors are orthogonal

• ICA vectors are not orthogonal

ICA vs PCA

Page 10: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

10

ICA vs PCA

Page 11: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

11

PCA Estimation Sources Observation

x(t) = As(t) s(t)

Mixing

y(t)=Wx(t)

The Cocktail Party Problem SOLVING WITH PCA

Page 12: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

12

ICA Estimation Sources Observation

x(t) = As(t) s(t)

Mixing

y(t)=Wx(t)

The Cocktail Party Problem SOLVING WITH ICA

Page 13: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

13

STATIC

• Image denoising

• Microarray data processing

• Decomposing the spectra of galaxies

• Face recognition

• Facial expression recognition

• Feature extraction

• Clustering

• Classification

• Deep Neural Networks

TEMPORAL

• Medical signal processing – fMRI, ECG, EEG

• Brain Computer Interfaces

• Modeling of the hippocampus, place cells

• Modeling of the visual cortex

• Time series analysis

• Financial applications

• Blind deconvolution

Some ICA Applications

Page 14: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

14

EEG ~ Neural cocktail party Severe contamination of EEG activity by

• eye movements • blinks • muscle • heart, ECG artifact • vessel pulse • electrode noise • line noise, alternating current (60 Hz)

ICA can improve signal • effectively detect, separate and remove activity in EEG

records from a wide variety of artifactual sources. (Jung, Makeig, Bell, and Sejnowski)

ICA weights help find location of sources

ICA Application, Removing Artifacts from EEG

Page 15: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

15 Fig from Jung

ICA Application, Removing Artifacts from EEG

Page 16: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

16 Fig from Jung

Removing Artifacts from EEG

Page 17: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

17 17

original noisy Wiener filtered

median filtered

ICA denoised

ICA for Image Denoising

(Hoyer, Hyvarinen)

Page 18: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

18

Method for analysis and synthesis of human motion from motion captured data

Provides perceptually meaningful components

109 markers, 327 parameters ) 6 independent components (emotion, content,…)

ICA for Motion Style Components

(Mori & Hoshino 2002, Shapiro et al 2006, Cao et al 2003)

Page 19: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

19 19

walk sneaky

walk with sneaky sneaky with walk

Page 20: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

20

Gabor wavelets,

edge detection,

receptive fields of V1 cells...

ICA basis vectors extracted from natural images

Page 21: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

21

PCA basis vectors extracted from natural images

Page 22: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

22

ICA Theory

Page 23: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

23

uncorrelated and independent variables

entropy, joint entropy, negentropy

mutual information

Kullback-Leibler divergence

Basic terms, definitions

Page 24: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

24

Proof: Homework

Definition:

Lemma:

Definition:

Statistical (in)dependence

Page 25: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

25

Definition:

Lemma:

Proof: Homework

Lemma:

Proof: Homework

Lemma:

Proof: Homework

Correlation

Page 26: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

26

Definition (Mutual Information)

Definition (Shannon entropy)

Definition (KL divergence)

Mutual Information, Entropy

Page 27: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

27

Solving the ICA problem with i.i.d. sources

Page 28: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

28

Solving the ICA problem with i.i.d. sources

Page 29: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

29

Theorem (Whitening)

Definitions

Note

Whitening

Page 30: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

30

Proof of the whitening theorem

Page 31: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

31

We can use PCA for whitening!

Proof of the whitening theorem

Page 32: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

32

whitened original mixed

Whitening solves half of the ICA problem

Note:

The number of free parameters of an N by N orthogonal

matrix is (N-1)(N-2)/2.

) whitening solves half of the ICA problem

Page 33: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

33

Remove mean, E[x]=0

Whitening, E[xxT]=I

Find an orthogonal W optimizing an objective function • Sequence of 2-d Jacobi (Givens) rotations

find y (the estimation of s),

find W (the estimation of A-1)

ICA solution: y=Wx

ICA task: Given x,

original mixed whitened rotated

(demixed)

Solving ICA

Page 34: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

34

p q

p

q

Optimization Using Jacobi Rotation Matrices

Page 35: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

35

The Gaussian distribution is spherically symmetric.

Mixing it with an orthogonal matrix… produces the same distribution...

However, this is the only ‘nice’ distribution that we cannot recover!

No hope for recovery...

Gaussian sources are problematic

Page 36: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

36 36

) go away from normal distribution

ICA Cost Functions

Page 37: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

37 37

The sum of independent variables converges to the normal distribution

) For separation go far away from the normal distribution

) Negentropy, |kurtozis| maximization

Figs borrowed from Ata Kaban

Central Limit Theorem

Page 38: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

38

ICA Algorithms

Page 39: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

39

There are more than 100 different ICA algorithms… • Mutual information (MI) estimation

• Kernel-ICA [Bach & Jordan, 2002]

• Entropy, negentropy estimation • Infomax ICA [Bell & Sejnowski 1995] • RADICAL [Learned-Miller & Fisher, 2003] • FastICA [Hyvarinen, 1999]

• [Girolami & Fyfe 1997] • ML estimation

• KDICA [Chen, 2006]

• EM-ICA [Welling]

• [MacKay 1996; Pearlmutter & Parra 1996; Cardoso 1997] • Higher order moments, cumulants based methods

• JADE [Cardoso, 1993]

• Nonlinear correlation based methods • [Jutten and Herault, 1991]

Algorithms

Page 40: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

40 David J.C. MacKay (97)

rows of W

Maximum Likelihood ICA Algorithm

Page 41: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

41

Kurtosis = 4th order cumulant

Measures

•the distance from normality

•the degree of peakedness

ICA algorithm based on Kurtosis maximization

Page 42: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

42

Probably the most famous ICA algorithm

The Fast ICA algorithm (Hyvarinen)

Page 43: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

43

Independent Subspace Analysis

Page 44: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

44

Original

Separated

Mixed

Hinton diagram

Independent Subspace Analysis

Page 45: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

45

Numerical Simulations 2D Letters (i.i.d.)

Sources Observation

Estimated sources

Performance matrix

Independent Subspace Analysis

Page 46: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

46

Independent Subspace Analysis

Page 47: Introduction to Machine Learning CMU-10701bapoczos/slides/ICA.pdf20. Independent Component Analysis Barnabás Póczos . 2 Contents ICA model ICA applications ICA generalizations ICA

47

Thanks for the Attention!