Microscopic Structure of Bilinear Chemical Data

1

12

Microscopic Structure of Bilinear Chemical Data

IASBS, Bahman 2-3, 1392January 22-23, 2014

2

12

Independent Component Analysis (ICA)

Hadi Parastar

Sharif University of Technology

3

Every problem becomes very childish when once it is explained to you.

—Sherlock Holmes (The Dancing Men, A.C. Doyle, 1905)

4

Representation of Multivariate Data- The key to understand and interpret multivariate data is suitable representation

- Such a representation is achieved using some kind of transform

- Transforms can be linear or non-linear

- Linear transform W applied to a data matrix X with objects as rows and variables as columns is as follow:

U = WX + E

- Broadly speaking, linear transform can be classified in two groups:

- Second-order methods - Higher-order methods

5

Second-order methods

Principal component analysis (PCA)

Factor analysis (FA) based methods

Multivariate curve resolution (MCR)

Linear Transform Techniques

6

Soft-modeling methods

Factor Analysis (FA)

Principal Component Analysis (PCA)

Blind source separation (BSS)

Independent Component Analysis (ICA)

7

hplc.mSimulating HPLC-DAD data

8

9

10

emgpeak.m

Chromatograms with distortions

11

Basic statistics

Expectation

Mean

Correlation matrix

12

Basic statistics

Covariance matrix

Note

13

Principal Component Analysis (PCA) Using an eigenvector rotation, it would be possible to decompose the X matrix into a series of loadings and scores

Underlying or intrinsic factors related to intelligence could then be detected

In chemistry, this approach can be used by diagonalizating the correlation or covariance matrix

14

Principal component analysis (PCA)

Explained variance

Residual variance

Data Model Noise= +X TPT ET

PTRaw data Scores

Loadings

Residuals

TPT

X=TPT+E

15

PCA Model: D = U VT

= +DU

VT

E

D = u1v1T + u2v2

T + ……+ unvnT + E

D u1v1T u2 v2

T unvnT E= + +….+ +

n number of components (<< number of variables in D)

rank 1 rank 1 rank 1

scores

loadings (projections)

Unexplained variance

16

x11 x12 x114

x21 x21 x214…

…

Principal Component Analysis (PCA)

••••••••••••••

x1

x2

17

PCA

••••••••••••••

u 1

u 2 u11

u12

u114

…

18

•

• ••

••

• •••

•• ••

x1

x2

x11 x12 x114

x21 x21 x214…

…

PCA

19

•

• ••

••

• •••

•• •• u 1u 2 u11

u12

u114

…

u21

u22

u214

…

PCA

u1 = ax1 + bx2

u2 = cx1 + dx2

20

PCA.m

21

22

Inner Product (Dot Product)x1

x2

xn

…x . x = xTx = [x1 x2 … xn] = x12 + x2

2 + … +xn2

= x 2

x . y = xTy = x y cos q

The cosine of the angle of two vectors is equal to the dot product between the normalized vectors:

x . y x y

cos q=

23

y

xx . y = x y

y

xx . y = - x y

y

x x . y = 0

Two vectors x and y are orthogonal when their scalar product is zero

x . y = 0 and x y = 1=

Two vectors x and y are orthonormal

24

PCA(Orthogonal coordinate)

ICA(Nonorthogonal coordinate)

PC1

PC2

25

ICA belongs to a class of blind source separation (BSS) methods

The goal of BSS is separating data into underlying informational components, where such data can take the form of spectra, images, sounds, telecommunication channels or stock market prices.

The term “Blind” is intended to imply that such methods can separate data into source signals even if very little is known about the nature of those source signals.

Independent Component Analysis: What Is It?

26

The Principle of ICA: a cocktail-party problem

x1(t)=a11 s1(t) +a12 s2(t) +a13 s3(t)

x2(t)=a21 s1(t) +a22 s2(t) +a12 s3(t)

x3(t)=a31 s1(t) +a32 s2(t) +a33 s3(t)

27

Independent Component AnalysisHerault and Jutten, 1991

Observed vector x is modelled by a linear latent variable model

Or in matrix form

Where:--- The mixing matrix A is constant--- The si are latent variables called the independent components--- Estimate both A and s, observing only x

1

m

i ij jj

x a s

1 1

2 2

n n

x sx s. .

A. .. .x s

X = AS

28

Independent Component Analysis

T X = AS E ICA bilinear model

ICA algorithms try to find independent sources

ˆ

ˆ

T

-1

T -1 T T

S = WXW = A

S = WX = A AS = S

T X = TP E T X = CS EPCA model MCR model

29

Independent Component Analysis Model

TX = AS ˆ TS = WX

30

Basic properties of the ICA model Must assume:

- The si are independent

- The si are nongaussian

- For simplicity: The matrix A is square

The si defined only up to a mltiplicative constant

The si are not ordered

31

32

33

Original sources

lCA sources

34

Statistical Independence If two or more signals are statistically independent of each other then the value of one signal provides no information regarding the value of the other signals.

For two variables

For more than two variables

Using expectation operator

35

Probability Density Function Moments of probability density functions, which are essentially a form of normalized histograms.

Histogram Approximate of PDF PDF

36

Histogram

Probability

37

38

39

40

Independence and Correlation The term “correlated” tends to be used in colloquial terms to suggest that two variables are related in a very general sense.

The entire structure of the joint pdf is implicit in the structure of its marginal pdfs because the joint pdf can be reconstructed exactly from the product of its marginal pdfs.

Covariance between x and y

41

Marginal PDF

Joint PDF

42

Correlation

Independence and Correlation

43

Independence and Correlation

The formal similarity between measures of independence and correlation can be interpreted as follows:

Correlation is a measure of the amount of covariation between x and y, and depends on the first moment of the pdf p only.

Independence is a measure of the covariation between [x raised to powers p]and [y raised to powers q], and depends on all moments of the pdf pxy.

Thus, independence can be considered as a generalized measure of correlation , such that

44

45

46

47

48

49

emgpeak.m

Chromatograms with distortions

0 50 100 1500

1

2

3

4

5

6

7

8

9

10

0 50 100 1500

1

2

3

4

5

6

7

8

9

10

50

0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7

8

9

10

0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7

8

9

10

51

MutualInfo.mJoint and marginal probability

density functions

52

0 50 100 1500

1

2

3

4

5

6

7

8

9

10

0 50 100 1500

1

2

3

4

5

6

7

8

9

10

Joint PDF = 0.0879Marginal PDF 1= 0.3017Marginal PDF 1= 0.3017


Correlation = -0.1847 Correlation = 0.9701

0.3017×0.3017=0.0910≈0.0879 0.3017×0.3017=0.0910≠0.4335

53

0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7

8

9

10

0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7

8

9

10



Correlation = -0.2123 Correlation = 0.7339

0.2013×0.4266=0.0858≈0.0816 0.2013×0.4266=0.0858≠0.1317

54

What does nongaussianity mean in ICA? Intuitively, one can say that the gaussian distributions are “too simple”.

The higher-order cumulants are zero for gaussian distributions, but such higher-order information is essential for estimation of the ICA model

Higher-order methods use information on the distribution of x that is not contained in the covariance matrix

The distribution of x must not be assumed to be Gaussian, because all the information of Gaussian variables is contained in the covariance matrix

55

Thus, ICA is essentially impossible if the observed variables have gaussian distributions.

Note that in the basic model we do not assume that we know what the nongaussian distributions of the ICs look like; if they are known, the problem will be considerably simplified.

Assume the joint distribution of two ICs, s1 and s2, is Gaussian

The joint density of mixtures x1 and x2 is as follow:

What does nongaussianity mean in ICA?

56

What does nongaussianity mean in ICA?

Due to orthogonality

We see that the orthogonal mixing matrix does not change the pdf, since it does not appear in this pdf at all.

The original and mixed distributions are identical. Therefore, there is no way how we could infer the mixing matrix from the mixtures.

57

Nongaussianity

Independence

How to estimate ICA model

• Principle for estimating the model of ICA

Maximization of NonGaussianity

59

Nongaussianity Measures

Kurtosis: Fourth-order cumulant

Entropy

Negentropy: Differential entropy

Mutual Information

60

Kurtosis Extrema of kurtosis give independent components

If then

The kurtosis is zero for Gaussian variables

Variables with positive kurtosis are called supergaussian

Variables with negative kurtosis are called subgaussian

61

Measures for NonGaussianity• Kurtosis

Super-Gaussian kurtosis > 0

Gaussian kurtosis = 0

Sub-Gaussian kurtosis < 0

Kurtosis : E{(x- )4}-3*[E{(x-)2}] 2

kurt(x1+x2)= kurt(x1) + kurt(x2)

kurt(x1) =4kurt(x1)

62

Mutual Information

Mutual Information (MI) can be defined as a natural measure of mutual dependence between two variables.

MI is always non-negative and it is zero if two variables are independent.

MI can be defined using Joint and Marginal PDF as follow:

2 2p( , )I( , ) = d d p( , ) log

p( ) p( )1 2

1 2 1 1 21 2

x xx x x x x xx x

63

Mutual Information Based on Entropy

Entropy is a measure of uniformity of the distribution of a bounded set of values, such that a complete uniformity corresponds to maximum entropy

From the information theory concept, entropy is considered as the measure of randomness of a signal

Gaussian signal has the largest entropy among the other signal distributions of unit variance

Entropy will be small for signals that have distribution concerned on certain values or have pdf that is very “spiky”

64

H( ) = - d p( ) log(p( ))i i i ix x x xH( , ) = - d d p( , ) log(p( , ))1 2 1 2 1 2 1 2x x x x x x x x

1 2 1 2 1 2( , ) ( ) ( ) ( , )I x x H x H x H x x

Mutual Information Based on Entropy Entropy can be used as a measure of nongaussianity

65

Ambiguities in ICA solutions

Scale or intensity ambiguity

Permutation ambiguity

X = A T T-1 ST + E = C ST + EC = A T; ST = T-1 ST

x a s a sij in njnin nj

n k 1

k

66

Central Limit Theorem (CLT)

A Gaussian PDF

Fortunately, the CLT does not place restrictions on how much of each source signal contributes to a signal mixture, so

that the above result holds true even if the mixing coefficients are not equal to unity

67

Central limit theorem

• The distribution of a sum of independent random variables tends toward a Gaussian distribution

Observed signal = IC1 IC2 ICnm1 + m2 ….+ mn

toward Gaussian Non-GaussianNon-GaussianNon-Gaussian

68

Preprocessing Centering

--- This step simplifies ICA algorithms by allowing us to assume a zero mean

c E x x x x m Whitening

--- Whitening involves linearly transforming the observation vector such that its components are uncorrelated and have unit variance

Tw wE x x I

w whitened vector x

69

Preprocessing Whitening --- A simple method to perform the whitening transformation is to use EigenValue Decomposition (EVD) of x

--- Whitened vector T TE xx VDV

12 T

w

x VD V x1

2 Tw w

x VD V As A s

T T T Tw w w w w wE E x x A ss A A A I

Whitening thus reduces the number of parameters to be estimated

70

S1 = randn(1000,1);S2 = randn(1000,1);Plot(S1,S2,’*’);

A=[1 2;1 1];S=[S1 S2];X=A*S;Plot(X1,X2,’*’);

71

pcamat.m

72

whitenv.mFor data whitening

73

[E,D]=pcamat(X);Xw=whitenv(X,E,D)Plot(Xw(1,:),Xw(2,:),’*’);

74

Objective (contrast) functions for ICA

ICA methodObjective function

Optimization algorithm= +

The statistical properties of the ICA method depend on the choice of objective function

--- consistency, robustess, asymptotic variance

The algorithmic properties depend on the optimization algorithm

--- convergence speed, memory requirement, numerical stability

75

Different ICA Algorithms Fast ICA Information Maximization (Infomax) Joint Approximate Diagonalization of Eigenmatrices (JADE) Robust Accurate Direct Independent Component Analysis aLgorithm (RADICAL)

Mutual Information based Least Dependent Component Analysis (MILCA)

Stochastic Nonnegative ICA (SNICA) Mean-Filed ICA (MFICA) Window ICA (WICA) Kernel ICA (KICA) Group ICA (GICA)

76

0 50 100 1500

1

2

3

4

5

6

7

8

9

10

0 50 100 1500

1

2

3

4

5

6

7

8

9

10

0 50 100 1500

1

2

3

4

5

6

7

8

9

10

0 50 100 1500

1

2

3

4

5

6

7

8

9

10

0 50 100 1500

1

2

3

4

5

6

7

8

9

10

X1

X2

X3

X4

X5

77

Data MPDF(1) MPDF(2) JPDFX1 0.3017 0.3017 0.0879X2 0.3017 0.3017 0.0878X3 0.3017 0.3017 0.0932X4 0.3017 0.3017 0.1141X5 0.3017 0.3017 0.4335

Data Independence Correlation

X1 0.0373 -0.185X2 0.0355 -0.182X3 0.0649 -0.053X4 0.3082 0.455X5 1.6824 0.970

78

0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7

8

9

10

0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7

8

9

10

0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7

8

9

10

0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7

8

9

10

0 20 40 60 80 100 120 140 160 180 2000

1

2

3

4

5

6

7

8

9

10

Y1

Y2

Y3

Y4

Y5

79

Data Independence Correlation

Y1 0.0501 -0.212Y2 0.0425 -0.199Y3 0.0431 -0.118Y4 0.2599 0.391Y5 0.4741 0.734

Data MPDF(1) MPDF(2) JPDFY1 0.2013 0.4266 0.0816Y2 0.2013 0.4266 0.0816Y3 0.2013 0.4266 0.0849Y4 0.2013 0.4266 0.1047Y5 0.2013 0.4266 0.1317

80

milca.m

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8

9

10

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8

9

10

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8

9

10

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8

9

10

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

4

5

6

7

8

9

10

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

82

0 10 20 30 40 50 60 70 80 90 100-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60 70 80 90 100-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60 70 80 90 100-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60 70 80 90 100-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 10 20 30 40 50 60 70 80 90 100-0.2

0

0.2

0.4

0.6

0.8

1

1.2

ICA solutions (Elution Profiles)

0 10 20 30 40 50 60 70 80 90 100-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0 10 20 30 40 50 60 70 80 90 100-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0 10 20 30 40 50 60 70 80 90 100-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60 70 80 90 100-3

-2

-1

0

1

2

3

0 10 20 30 40 50 60 70 80 90 100-10

-5

0

5

10

15

ICA solutions (Spectral Profiles )

0 5 10 15 20 25 30 35 40 45 50-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 5 10 15 20 25 30 35 40 45 50-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 5 10 15 20 25 30 35 40 45 50-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 5 10 15 20 25 30 35 40 45 50-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 5 10 15 20 25 30 35 40 45 50-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

True

85

PCA.m

86

0 10 20 30 40 50 60 70 80 90 100-3

-2

-1

0

1

0 10 20 30 40 50 60 70 80 90 100-1

0

1

2

3

0 10 20 30 40 50 60 70 80 90 100-3

-2

-1

0

1

0 10 20 30 40 50 60 70 80 90 100-1

0

1

2

3

0 10 20 30 40 50 60 70 80 90 100-1

0

1

2

3

PCA solutions (Elution Profiles)

87

0 5 10 15 20 25 30 35 40 45 50-0.4

-0.2

0

0.2

0.4

0 5 10 15 20 25 30 35 40 45 50-0.4

-0.2

0

0.2

0.4

0 5 10 15 20 25 30 35 40 45 50-0.4

-0.2

0

0.2

0.4

0 5 10 15 20 25 30 35 40 45 50-0.4

-0.2

0

0.2

0.4

0 5 10 15 20 25 30 35 40 45 50-0.4

-0.2

0

0.2

0.4

PCA solutions (Spectral Profiles )

88

mcrals.m

89

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

0 10 20 30 40 50 60 70 80 90 1000

1

2

3

0 10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

0 10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

MCR solutions (Elution Profiles)

90

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

0 5 10 15 20 25 30 35 40 45 500

0.1

0.2

0.3

0.4

MCR solutions (Spectral Profiles)

MI

TRUE ICA ICA MCR PCA

0.686 0.3971 0.686 0.687 0.6414

0.686 0.419 0.686 0.686 0.6391

0.686 0.3976 0.71 0.715 0.582

0.686 0.4112 0.839 0.862 0.5854

0.686 0.419 1.423 1.44 0.5939

Evaluation of the independence of the ICA solutions

Independence

Independence and

nonnegativity

OrthogonalityNonnegativity

92

Independent Component Analysis

Least-dependent Component Analysis

93

Decreasing chromatographic resolution

94

0 10 20 30 40 50 60 70 80 90 100-4

-3

-2

-1

0

1

2

3

4x 10

-5

-4 -3 -2 -1 0 1 2 3 4

x 10-5

0

5

10

15

20

25

30

35

40

45

Added white noise

Histogram of noise

95

ICA solutions

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

X1

X2

X3

X4

X5

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

True

96

Dataset JPDF MPDF(1) MPDF(2)

TRUE ICA MCR TRUE ICA MCR TRUE ICA MCR

1 23.208 23.214 23.209 2.934 2.934 2.934 2.906 2.906 2.906

2 23.208 23.267 23.267 2.934 2.934 2.934 2.906 2.901 2.901

3 23.208 25.571 26.615 2.934 2.952 2.932 2.906 2.728 2.701

4 23.208 36.638 37.126 2.934 2.595 2.826 2.906 2.815 2.588

5 23.208 110.324 112.022 2.934 2.579 2.645 2.906 2.643 2.580

Evaluation of the independence of the ICA solutions

97

0 5 10 15 20 25 30 35 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 5 10 15 20 25 30 35 400

0.2

0.4

0.6

0.8

1

1.2

1.4

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Two-component reaction system(Without Noise)

0 5 10 15 20 25 30 35 40-0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

TrueICAMCR

Feasible bands (conc) (solid lines)

Feasible bands (spec) (solid lines)

0 5 10 15 20 25 30 35 40 45 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

True MCR & ICA

101

Does independency change in the area of

feasible solutions?

102

Applications of ICA in Chemistry

Data Preprocessing

Exploratory Data Analysis

Multivariate Resolution

Multivariate Calibration

Multivariate Classification

Multiariate Image Analysis

103

Recent Advances in ICA Group independent component analysis, or three-way data

104

Thanks for your attention …

105

Acknowledgement Prof. Mehdi Jalali-Heravi

Prof. Roma Tauler

Dr. Stefan Yord Platikanov

My students

106

Prof. Robert Rajko to join this workshop

Documents

Microscopic Structure of Bilinear Chemical Data