Independent Component Analysis

Independent ComponentIndependent Component AnalysisAnalysis

What is ICA?

“Independent component analysis (ICA) is a method for finding underlying factors or components from multivariate (multi-dimensional) statistical data. What distinguishes ICA from other methods is that it looks for components that are both statistically independent, and nonGaussian.”

A.Hyvarinen, A.Karhunen, E.Oja

‘Independent Component Analysis’

ICAICABlind Signal Separation (BSS) or Independent Component Analysis (ICA) is

the identification & separation of mixtures of sources with little prior information.

• Applications include:

– Audio Processing– Medical data– Finance– Array processing (beamforming)– Coding

• … and most applications where Factor Analysis and PCA is currently used.• While PCA seeks directions that represents data best in a Σ|x0 - x|2 sense,

ICA seeks such directions that are most independent from each other.Often used on Time Series separation of Multiple Targets

ICA estimation principles by A.Hyvarinen, A.Karhunen, E.Oja ‘Independent Component

Analysis’

• Principle 1: “Nonlinear decorrelation. Find the matrix W so that for any i ≠ j , the components yi and yj are uncorrelated, and the transformed components g(yi) and h(yj) are uncorrelated, where g and h are some suitable nonlinear functions.”

• Principle 2: “Maximum nongaussianity. Find the local maxima of nongaussianity of a linear combination y=Wx under the constraint that the variance of is constant. Each local maximum gives one independent component.”

ICA mathematical approach from A.Hyvarinen, A.Karhunen, E.Oja ‘Independent Component

Analysis’

“Given a set of observations of random variables x1(t), x2(t)…xn(t), where t is the time or sample index, assume that they are generated as a linear mixture of independent components: y=Wx, where W is some unknown matrix. Independent component analysis now consists of estimating both the matrix W and the yi(t), when we only observe the xi(t).”

The simple “Cocktail Party” ProblemThe simple “Cocktail Party” Problem

Sources

Observations

s1

s2

x1

x2

Mixing matrix A

x = As

n sources, m=n observations

Classical ICA (fast ICA) estimation

0 50 100 150 200 250

-0.2

-0.1

0.0

0.1

0.2

V1

0 50 100 150 200 250

-0.2

-0.1

0.0

0.1

0.2

V2

0 50 100 150 200 250

-0.10

-0.05

0.00

0.05

0.10

V3

ICA

Observing signals Original source signal

0 50 100 150 200 250

-0.10

-0.05

0.00

0.05

0.10

V4

MotivationMotivation

Two Independent Sources Mixture at two Mics

aIJ ... Depend on the distances of the microphones from the speakers

2221212

2121111

)(

)(

sasatx

sasatx

MotivationMotivation

Get the Independent Signals out of the Mixture

ICA Model (Noise Free)ICA Model (Noise Free)

• Use statistical “latent variables“ system

• Random variable sk instead of time signal

• xj = aj1s1 + aj2s2 + .. + ajnsn, for all j

x = As

• IC‘s s are latent variables & are unknown AND Mixing matrix A is also unknown

• Task: estimate A and s using only the observeable random vector x

• Lets assume that no. of IC‘s = no of observable mixtures

and A is square and invertible

• So after estimating A, we can compute W=A-1 and hence

s = Wx = A-1x

IllustrationIllustration

2 IC‘s with distribution:

Zero mean and variance equal to 1

Mixing matrix A is

The edges of the parallelogram are in the direction of the cols of A

So if we can Est joint pdf of x1 & x2 and then locating the edges, we can Est A.

12

32A

otherwise

sifsp i

i0

3||32

1)(

RestrictionsRestrictions

• si are statistically independent

– p(s1,s2) = p(s1)p(s2)

• Nongaussian distributions– The joint density of unit variance

s1 & s2 is symmetric. So it doesn‘t contain any information about the directions of the cols of the mixing matrix A. So A cann‘t be estimated.

– If only one IC is gaussian, the estimation is still possible.

2exp

2

1),(

22

21

21

xxxxp

AmbiguitiesAmbiguities

• Can‘t determine the variances (energies) of the IC‘s– Both s & A are unknowns, any scalar multiple in one of the sources

can always be cancelled by dividing the corresponding col of A by it.

– Fix magnitudes of IC‘s assuming unit variance: E{si2} = 1

– Only ambiguity of sign remains

• Can‘t determine the order of the IC‘s– Terms can be freely changed, because both s and A are unknown. So

we can call any IC as the first one.

ICA Principal (Non-Gaussian is Independent)ICA Principal (Non-Gaussian is Independent)

• Key to estimating A is non-gaussianity• The distribution of a sum of independent random variables tends toward a Gaussian

distribution. (By CLT)

f(s1) f(s2) f(x1) = f(s1 +s2)• Where w is one of the rows of matrix W.

• y is a linear combination of si, with weights given by zi. • Since sum of two indep r.v. is more gaussian than individual r.v., so zTs is more gaussian

than either of si. AND becomes least gaussian when its equal to one of s i.• So we could take w as a vector which maximizes the non-gaussianity of wTx.• Such a w would correspond to a z with only one non zero comp. So we get back the s i.

szAswxwy TTT

How to estimate ICA model

• Principle for estimating the model of ICA

Maximization of NonGaussianity

Measures for NonGaussianity• Kurtosis

Super-Gaussian kurtosis > 0

Gaussian kurtosis = 0

Sub-Gaussian kurtosis < 0

Kurtosis : E{(x- )4}-3*[E{(x-)2}] 2

kurt(x1+x2)= kurt(x1) + kurt(x2)

kurt(x1) =4kurt(x1)

Measures of Non-GaussianityMeasures of Non-Gaussianity

• We need to have a quantitative measure of non-gaussianity for ICA Estimation.

• Kurtotis : gauss=0 (sensitive to outliers)

• Entropy : gauss=largest

• Neg-entropy : gauss = 0 (difficult to estimate)

• Approximations

• where v is a standard gaussian random variable and :

224 }){(3}{)( yEyEykurt

dyyfyfyH )(log)()(

)()()( yHyHyJ gauss

222 )(481

121)( ykurtyEyJ

2)()()( vGEyGEyJ

)2/.exp()(

).cosh(log1)(

2uayG

yaayG

Data Centering & WhiteningData Centering & Whitening• Centering

x = x‘ – E{x‘}– But this doesn‘t mean that ICA cannt estimate the mean, but it just simplifies

the Alg.– IC‘s are also zero mean because of:

E{s} = WE{x}– After ICA, add W.E{x‘} to zero mean IC‘s

• Whitening– We transform the x’s linearly so that the x~ are white. Its done by EVD.

x~ = (ED-1/2ET)x = ED-1/2ET Ax = A~swhere E{xx~} = EDET

So we have to Estimate Orthonormal Matrix A~

– An orthonormal matrix has n(n-1)/2 degrees of freedom. So for large dim A we have to est only half as much parameters. This greatly simplifies ICA.

• Reducing dim of data (choosing dominant Eig) while doing whitening also help.

Computing the pre-processing steps for ICA

0) Centring = make the signals centred in zero

xi xi - E[xi] for each i

1) Sphering = make the signals uncorrelated. I.e. apply a transform V to x such that Cov(Vx)=I // where Cov(y)=E[yyT] denotes covariance matrix

V=E[xxT]-1/2 // can be done using ‘sqrtm’ function in MatLab xVx // for all t (indexes t dropped here) // bold lowercase refers to column vector; bold upper to matrix

Scope: to make the remaining computations simpler. It is known that independent variables must be uncorrelated – so this can be fulfilled before proceeding to the full ICA

Application domains of ICA

• Blind source separation (Bell&Sejnowski, Te won Lee, Girolami, Hyvarinen, etc.)

• Image denoising (Hyvarinen)• Medical signal processing – fMRI, ECG, EEG (Mackeig)• Modelling of the hippocampus and visual cortex (Lorincz,

Hyvarinen)• Feature extraction, face recognition (Marni Bartlett)• Compression, redundancy reduction• Watermarking (D Lowe)• Clustering (Girolami, Kolenda)• Time series analysis (Back, Valpola)• Topic extraction (Kolenda, Bingham, Kaban)• Scientific Data Mining (Kaban, etc)

Image denoising

Wiener filtering

ICA filtering

Noisy image

Original image

Documents

Independent Component Analysis