Upload
sherry
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
12. Microscopic Structure of Bilinear Chemical Data. IASBS, Bahman 2-3, 1392 January 22-23, 2014. 12. Independent Component Analysis (ICA). Hadi Parastar Sharif University of Technology. Every problem becomes very childish when once it is explained to you. - PowerPoint PPT Presentation
Citation preview
1
12
Microscopic Structure of Bilinear Chemical Data
IASBS, Bahman 2-3, 1392January 22-23, 2014
2
12
Independent Component Analysis (ICA)
Hadi Parastar
Sharif University of Technology
3
Every problem becomes very childish when once it is explained to you.
—Sherlock Holmes (The Dancing Men, A.C. Doyle, 1905)
4
Representation of Multivariate Data- The key to understand and interpret multivariate data is suitable representation
- Such a representation is achieved using some kind of transform
- Transforms can be linear or non-linear
- Linear transform W applied to a data matrix X with objects as rows and variables as columns is as follow:
U = WX + E
- Broadly speaking, linear transform can be classified in two groups:
- Second-order methods - Higher-order methods
5
Second-order methods
Principal component analysis (PCA)
Factor analysis (FA) based methods
Multivariate curve resolution (MCR)
Linear Transform Techniques
6
Soft-modeling methods
Factor Analysis (FA)
Principal Component Analysis (PCA)
Blind source separation (BSS)
Independent Component Analysis (ICA)
7
hplc.mSimulating HPLC-DAD data
8
9
10
emgpeak.m
Chromatograms with distortions
11
Basic statistics
Expectation
Mean
Correlation matrix
12
Basic statistics
Covariance matrix
Note
13
Principal Component Analysis (PCA) Using an eigenvector rotation, it would be possible to decompose the X matrix into a series of loadings and scores
Underlying or intrinsic factors related to intelligence could then be detected
In chemistry, this approach can be used by diagonalizating the correlation or covariance matrix
14
Principal component analysis (PCA)
Explained variance
Residual variance
Data Model Noise= +X TPT ET
PTRaw data Scores
Loadings
Residuals
TPT
X=TPT+E
15
PCA Model: D = U VT
= +DU
VT
E
D = u1v1T + u2v2
T + ……+ unvnT + E
D u1v1T u2 v2
T unvnT E= + +….+ +
n number of components (<< number of variables in D)
rank 1 rank 1 rank 1
scores
loadings (projections)
Unexplained variance
16
x11 x12 x114
x21 x21 x214…
…
Principal Component Analysis (PCA)
••••••••••••••
x1
x2
17
PCA
••••••••••••••
u 1
u 2 u11
u12
u114
…
18
•
• ••
••
• •••
•• ••
x1
x2
x11 x12 x114
x21 x21 x214…
…
PCA
19
•
• ••
••
• •••
•• •• u 1u 2 u11
u12
u114
…
u21
u22
u214
…
PCA
u1 = ax1 + bx2
u2 = cx1 + dx2
20
PCA.m
21
22
Inner Product (Dot Product)x1
x2
xn
…x . x = xTx = [x1 x2 … xn] = x12 + x2
2 + … +xn2
= x 2
x . y = xTy = x y cos q
The cosine of the angle of two vectors is equal to the dot product between the normalized vectors:
x . y x y
cos q=
23
y
xx . y = x y
y
xx . y = - x y
y
x x . y = 0
Two vectors x and y are orthogonal when their scalar product is zero
x . y = 0 and x y = 1=
Two vectors x and y are orthonormal
24
PCA(Orthogonal coordinate)
ICA(Nonorthogonal coordinate)
PC1
PC2
25
ICA belongs to a class of blind source separation (BSS) methods
The goal of BSS is separating data into underlying informational components, where such data can take the form of spectra, images, sounds, telecommunication channels or stock market prices.
The term “Blind” is intended to imply that such methods can separate data into source signals even if very little is known about the nature of those source signals.
Independent Component Analysis: What Is It?
26
The Principle of ICA: a cocktail-party problem
x1(t)=a11 s1(t) +a12 s2(t) +a13 s3(t)
x2(t)=a21 s1(t) +a22 s2(t) +a12 s3(t)
x3(t)=a31 s1(t) +a32 s2(t) +a33 s3(t)
27
Independent Component AnalysisHerault and Jutten, 1991
Observed vector x is modelled by a linear latent variable model
Or in matrix form
Where:--- The mixing matrix A is constant--- The si are latent variables called the independent components--- Estimate both A and s, observing only x
1
m
i ij jj
x a s
1 1
2 2
n n
x sx s. .
A. .. .x s
X = AS
28
Independent Component Analysis
T X = AS E ICA bilinear model
ICA algorithms try to find independent sources
ˆ
ˆ
T
-1
T -1 T T
S = WXW = A
S = WX = A AS = S
T X = TP E T X = CS EPCA model MCR model
29
Independent Component Analysis Model
TX = AS ˆ TS = WX
30
Basic properties of the ICA model Must assume:
- The si are independent
- The si are nongaussian
- For simplicity: The matrix A is square
The si defined only up to a mltiplicative constant
The si are not ordered
31
32
33
Original sources
lCA sources
34
Statistical Independence If two or more signals are statistically independent of each other then the value of one signal provides no information regarding the value of the other signals.
For two variables
For more than two variables
Using expectation operator
35
Probability Density Function Moments of probability density functions, which are essentially a form of normalized histograms.
Histogram Approximate of PDF PDF
36
Histogram
Probability
37
38
39
40
Independence and Correlation The term “correlated” tends to be used in colloquial terms to suggest that two variables are related in a very general sense.
The entire structure of the joint pdf is implicit in the structure of its marginal pdfs because the joint pdf can be reconstructed exactly from the product of its marginal pdfs.
Covariance between x and y
41
Marginal PDF
Joint PDF
42
Correlation
Independence and Correlation
43
Independence and Correlation
The formal similarity between measures of independence and correlation can be interpreted as follows:
Correlation is a measure of the amount of covariation between x and y, and depends on the first moment of the pdf p only.
Independence is a measure of the covariation between [x raised to powers p]and [y raised to powers q], and depends on all moments of the pdf pxy.
Thus, independence can be considered as a generalized measure of correlation , such that
44
45
46
47
48
49
emgpeak.m
Chromatograms with distortions
0 50 100 1500
1
2
3
4
5
6
7
8
9
10
0 50 100 1500
1
2
3
4
5
6
7
8
9
10
50
0 20 40 60 80 100 120 140 160 180 2000
1
2
3
4
5
6
7
8
9
10
0 20 40 60 80 100 120 140 160 180 2000
1
2
3
4
5
6
7
8
9
10
51
MutualInfo.mJoint and marginal probability
density functions
52
0 50 100 1500
1
2
3
4
5
6
7
8
9
10
0 50 100 1500
1
2
3
4
5
6
7
8
9
10
Joint PDF = 0.0879Marginal PDF 1= 0.3017Marginal PDF 1= 0.3017
Joint PDF = 0.4335Marginal PDF 1= 0.3017Marginal PDF 1= 0.3017
Correlation = -0.1847 Correlation = 0.9701
0.3017×0.3017=0.0910≈0.0879 0.3017×0.3017=0.0910≠0.4335
53
0 20 40 60 80 100 120 140 160 180 2000
1
2
3
4
5
6
7
8
9
10
0 20 40 60 80 100 120 140 160 180 2000
1
2
3
4
5
6
7
8
9
10
Joint PDF = 0.0816Marginal PDF 1= 0.2013Marginal PDF 1= 0.4266
Joint PDF = 0.1317Marginal PDF 1= 0.2038Marginal PDF 1= 0.4265
Correlation = -0.2123 Correlation = 0.7339
0.2013×0.4266=0.0858≈0.0816 0.2013×0.4266=0.0858≠0.1317
54
What does nongaussianity mean in ICA? Intuitively, one can say that the gaussian distributions are “too simple”.
The higher-order cumulants are zero for gaussian distributions, but such higher-order information is essential for estimation of the ICA model
Higher-order methods use information on the distribution of x that is not contained in the covariance matrix
The distribution of x must not be assumed to be Gaussian, because all the information of Gaussian variables is contained in the covariance matrix
55
Thus, ICA is essentially impossible if the observed variables have gaussian distributions.
Note that in the basic model we do not assume that we know what the nongaussian distributions of the ICs look like; if they are known, the problem will be considerably simplified.
Assume the joint distribution of two ICs, s1 and s2, is Gaussian
The joint density of mixtures x1 and x2 is as follow:
What does nongaussianity mean in ICA?
56
What does nongaussianity mean in ICA?
Due to orthogonality
We see that the orthogonal mixing matrix does not change the pdf, since it does not appear in this pdf at all.
The original and mixed distributions are identical. Therefore, there is no way how we could infer the mixing matrix from the mixtures.
57
Nongaussianity
Independence
How to estimate ICA model
• Principle for estimating the model of ICA
Maximization of NonGaussianity
59
Nongaussianity Measures
Kurtosis: Fourth-order cumulant
Entropy
Negentropy: Differential entropy
Mutual Information
60
Kurtosis Extrema of kurtosis give independent components
If then
The kurtosis is zero for Gaussian variables
Variables with positive kurtosis are called supergaussian
Variables with negative kurtosis are called subgaussian
61
Measures for NonGaussianity• Kurtosis
Super-Gaussian kurtosis > 0
Gaussian kurtosis = 0
Sub-Gaussian kurtosis < 0
Kurtosis : E{(x- )4}-3*[E{(x-)2}] 2
kurt(x1+x2)= kurt(x1) + kurt(x2)
kurt(x1) =4kurt(x1)
62
Mutual Information
Mutual Information (MI) can be defined as a natural measure of mutual dependence between two variables.
MI is always non-negative and it is zero if two variables are independent.
MI can be defined using Joint and Marginal PDF as follow:
2 2p( , )I( , ) = d d p( , ) log
p( ) p( )1 2
1 2 1 1 21 2
x xx x x x x xx x
63
Mutual Information Based on Entropy
Entropy is a measure of uniformity of the distribution of a bounded set of values, such that a complete uniformity corresponds to maximum entropy
From the information theory concept, entropy is considered as the measure of randomness of a signal
Gaussian signal has the largest entropy among the other signal distributions of unit variance
Entropy will be small for signals that have distribution concerned on certain values or have pdf that is very “spiky”
64
H( ) = - d p( ) log(p( ))i i i ix x x xH( , ) = - d d p( , ) log(p( , ))1 2 1 2 1 2 1 2x x x x x x x x
1 2 1 2 1 2( , ) ( ) ( ) ( , )I x x H x H x H x x
Mutual Information Based on Entropy Entropy can be used as a measure of nongaussianity
65
Ambiguities in ICA solutions
Scale or intensity ambiguity
Permutation ambiguity
X = A T T-1 ST + E = C ST + EC = A T; ST = T-1 ST
x a s a sij in njnin nj
n k 1
k
66
Central Limit Theorem (CLT)
A Gaussian PDF
Fortunately, the CLT does not place restrictions on how much of each source signal contributes to a signal mixture, so
that the above result holds true even if the mixing coefficients are not equal to unity
67
Central limit theorem
• The distribution of a sum of independent random variables tends toward a Gaussian distribution
Observed signal = IC1 IC2 ICnm1 + m2 ….+ mn
toward Gaussian Non-GaussianNon-GaussianNon-Gaussian
68
Preprocessing Centering
--- This step simplifies ICA algorithms by allowing us to assume a zero mean
c E x x x x m Whitening
--- Whitening involves linearly transforming the observation vector such that its components are uncorrelated and have unit variance
Tw wE x x I
w whitened vector x
69
Preprocessing Whitening --- A simple method to perform the whitening transformation is to use EigenValue Decomposition (EVD) of x
--- Whitened vector T TE xx VDV
12 T
w
x VD V x1
2 Tw w
x VD V As A s
T T T Tw w w w w wE E x x A ss A A A I
Whitening thus reduces the number of parameters to be estimated
70
S1 = randn(1000,1);S2 = randn(1000,1);Plot(S1,S2,’*’);
A=[1 2;1 1];S=[S1 S2];X=A*S;Plot(X1,X2,’*’);
71
pcamat.m
72
whitenv.mFor data whitening
73
[E,D]=pcamat(X);Xw=whitenv(X,E,D)Plot(Xw(1,:),Xw(2,:),’*’);
74
Objective (contrast) functions for ICA
ICA methodObjective function
Optimization algorithm= +
The statistical properties of the ICA method depend on the choice of objective function
--- consistency, robustess, asymptotic variance
The algorithmic properties depend on the optimization algorithm
--- convergence speed, memory requirement, numerical stability
75
Different ICA Algorithms Fast ICA Information Maximization (Infomax) Joint Approximate Diagonalization of Eigenmatrices (JADE) Robust Accurate Direct Independent Component Analysis aLgorithm (RADICAL)
Mutual Information based Least Dependent Component Analysis (MILCA)
Stochastic Nonnegative ICA (SNICA) Mean-Filed ICA (MFICA) Window ICA (WICA) Kernel ICA (KICA) Group ICA (GICA)
76
0 50 100 1500
1
2
3
4
5
6
7
8
9
10
0 50 100 1500
1
2
3
4
5
6
7
8
9
10
0 50 100 1500
1
2
3
4
5
6
7
8
9
10
0 50 100 1500
1
2
3
4
5
6
7
8
9
10
0 50 100 1500
1
2
3
4
5
6
7
8
9
10
X1
X2
X3
X4
X5
77
Data MPDF(1) MPDF(2) JPDFX1 0.3017 0.3017 0.0879X2 0.3017 0.3017 0.0878X3 0.3017 0.3017 0.0932X4 0.3017 0.3017 0.1141X5 0.3017 0.3017 0.4335
Data Independence Correlation
X1 0.0373 -0.185X2 0.0355 -0.182X3 0.0649 -0.053X4 0.3082 0.455X5 1.6824 0.970
78
0 20 40 60 80 100 120 140 160 180 2000
1
2
3
4
5
6
7
8
9
10
0 20 40 60 80 100 120 140 160 180 2000
1
2
3
4
5
6
7
8
9
10
0 20 40 60 80 100 120 140 160 180 2000
1
2
3
4
5
6
7
8
9
10
0 20 40 60 80 100 120 140 160 180 2000
1
2
3
4
5
6
7
8
9
10
0 20 40 60 80 100 120 140 160 180 2000
1
2
3
4
5
6
7
8
9
10
Y1
Y2
Y3
Y4
Y5
79
Data Independence Correlation
Y1 0.0501 -0.212Y2 0.0425 -0.199Y3 0.0431 -0.118Y4 0.2599 0.391Y5 0.4741 0.734
Data MPDF(1) MPDF(2) JPDFY1 0.2013 0.4266 0.0816Y2 0.2013 0.4266 0.0816Y3 0.2013 0.4266 0.0849Y4 0.2013 0.4266 0.1047Y5 0.2013 0.4266 0.1317
80
milca.m
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
9
10
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
9
10
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
9
10
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
9
10
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
9
10
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
82
0 10 20 30 40 50 60 70 80 90 100-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 10 20 30 40 50 60 70 80 90 100-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 10 20 30 40 50 60 70 80 90 100-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 10 20 30 40 50 60 70 80 90 100-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 10 20 30 40 50 60 70 80 90 100-0.2
0
0.2
0.4
0.6
0.8
1
1.2
ICA solutions (Elution Profiles)
0 10 20 30 40 50 60 70 80 90 100-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0 10 20 30 40 50 60 70 80 90 100-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0 10 20 30 40 50 60 70 80 90 100-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50 60 70 80 90 100-3
-2
-1
0
1
2
3
0 10 20 30 40 50 60 70 80 90 100-10
-5
0
5
10
15
ICA solutions (Spectral Profiles )
0 5 10 15 20 25 30 35 40 45 50-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 5 10 15 20 25 30 35 40 45 50-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 5 10 15 20 25 30 35 40 45 50-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 5 10 15 20 25 30 35 40 45 50-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 5 10 15 20 25 30 35 40 45 50-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
True
85
PCA.m
86
0 10 20 30 40 50 60 70 80 90 100-3
-2
-1
0
1
0 10 20 30 40 50 60 70 80 90 100-1
0
1
2
3
0 10 20 30 40 50 60 70 80 90 100-3
-2
-1
0
1
0 10 20 30 40 50 60 70 80 90 100-1
0
1
2
3
0 10 20 30 40 50 60 70 80 90 100-1
0
1
2
3
PCA solutions (Elution Profiles)
87
0 5 10 15 20 25 30 35 40 45 50-0.4
-0.2
0
0.2
0.4
0 5 10 15 20 25 30 35 40 45 50-0.4
-0.2
0
0.2
0.4
0 5 10 15 20 25 30 35 40 45 50-0.4
-0.2
0
0.2
0.4
0 5 10 15 20 25 30 35 40 45 50-0.4
-0.2
0
0.2
0.4
0 5 10 15 20 25 30 35 40 45 50-0.4
-0.2
0
0.2
0.4
PCA solutions (Spectral Profiles )
88
mcrals.m
89
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
0 10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5
2
0 10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5
2
MCR solutions (Elution Profiles)
90
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
0 5 10 15 20 25 30 35 40 45 500
0.2
0.4
0.6
0.8
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4
MCR solutions (Spectral Profiles)
MI
TRUE ICA ICA MCR PCA
0.686 0.3971 0.686 0.687 0.6414
0.686 0.419 0.686 0.686 0.6391
0.686 0.3976 0.71 0.715 0.582
0.686 0.4112 0.839 0.862 0.5854
0.686 0.419 1.423 1.44 0.5939
Evaluation of the independence of the ICA solutions
Independence
Independence and
nonnegativity
OrthogonalityNonnegativity
92
Independent Component Analysis
Least-dependent Component Analysis
93
Decreasing chromatographic resolution
94
0 10 20 30 40 50 60 70 80 90 100-4
-3
-2
-1
0
1
2
3
4x 10
-5
-4 -3 -2 -1 0 1 2 3 4
x 10-5
0
5
10
15
20
25
30
35
40
45
Added white noise
Histogram of noise
95
ICA solutions
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
X1
X2
X3
X4
X5
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
True
96
Dataset JPDF MPDF(1) MPDF(2)
TRUE ICA MCR TRUE ICA MCR TRUE ICA MCR
1 23.208 23.214 23.209 2.934 2.934 2.934 2.906 2.906 2.906
2 23.208 23.267 23.267 2.934 2.934 2.934 2.906 2.901 2.901
3 23.208 25.571 26.615 2.934 2.952 2.932 2.906 2.728 2.701
4 23.208 36.638 37.126 2.934 2.595 2.826 2.906 2.815 2.588
5 23.208 110.324 112.022 2.934 2.579 2.645 2.906 2.643 2.580
Evaluation of the independence of the ICA solutions
97
0 5 10 15 20 25 30 35 400
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 5 10 15 20 25 30 35 400
0.2
0.4
0.6
0.8
1
1.2
1.4
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Two-component reaction system(Without Noise)
0 5 10 15 20 25 30 35 40-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
TrueICAMCR
Feasible bands (conc) (solid lines)
Feasible bands (spec) (solid lines)
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
True MCR & ICA
101
Does independency change in the area of
feasible solutions?
102
Applications of ICA in Chemistry
Data Preprocessing
Exploratory Data Analysis
Multivariate Resolution
Multivariate Calibration
Multivariate Classification
Multiariate Image Analysis
103
Recent Advances in ICA Group independent component analysis, or three-way data
104
Thanks for your attention …
105
Acknowledgement Prof. Mehdi Jalali-Heravi
Prof. Roma Tauler
Dr. Stefan Yord Platikanov
My students
106
Prof. Robert Rajko to join this workshop