INVESTIGATION OF MAIN CONTAMINATION SOURCES OF HEAVY METAL IONS IN FISH,
SEDIMENTS, AND WATERS FROM CATALONIA RIVERS USING DIFFERENT MULTIWAY DATA
ANALYSIS METHODS
Emma Peré-Trepat1 and Romà Tauler 2*
1 Dept. of Analytical Chemistry, Universitat de Barcelona, Diagonal 647, 08028 Barcelona, Spain
2 IIQAB-CSIC, Jordi Girona 18-26, 08034 Barcelona, Spain
* e-mail: [email protected]
Outline:
• Introduction and motivations of this work
• Environmental data tables and chemometrics models and methods
• Example of application: metal contamination sources in fish, sediment and surface water river samples.
• Conclusions
Introduction and motivations of this work
• Pollution and toxicological chemical compounds are a threat for the environment and the health which need urgent measures and actions
• Environmental monitoring studies produce huge amounts of multivariate data ordered in large data tables (data matrices)
• The bottle neck in the study of these environmental data tables is their analysis and interpretation
• There is a need for chemometrics (statistical and numerical analysis of multivariate chemical data) analysis of these data tables!
What kind of information can be obtained from chemometric analysis of environmental multivariate data tables?
1. Detection, identification, interpretation and resolution of the main sources of contamination
2. Distribution of these contamination sources in the environment: geographically, temporally, by environmental compartment (air, water, sediments, biota,...),…
3. Distinction between point and diffuse contamination sources sources
4. Quantitative apportionment of these sources .....
Introduction and motivations of this work
In this work different chemometric multiway data analysis methods are compared for the resolution of the environmental sources of 11 metal ions in 17 river samples of fish, sediment and water at the same site locations of Catalonia (NE, Spain).
• Two-way bilinear model based methods• MA-PCA Matrix Augmentation Principal Component
Analysis • MA-MCR-ALS Matrix Augmentation Multivariate Curve
Resolution Alternating Least Squares• Three-way trilinear models based methods
• PARAFAC • TUCKER3• MCR-ALS trilinear• MCR-ALS TUCKER3
Introduction and motivations of this work
Special attention will be paid to:
• Finding ways to compare results obtained using bilinear and trilinear models for three-way data: getting profiles in three modes from bilinear models of three-way data
• Adaptation of MCR-ALS to the fulfillment of PARAFAC and TUCKER3 trilinear models
•Reliability of solutions: calculation of boundaries of bands of feasible solutions
•Integration of Geostatistics and Chemometrics in the investigation of environmental data
Outline:
• Introduction and motivations of this work
• Environmental data tables and chemometrics models and methods
• Example of application: metal contamination sources in fish, sediment and river surface water samples.
• Conclusions
I sa
mpl
es
J variables
0 5 10 15 20 25 30-50
0
50
100
150
200
250
300
350
0 5 10 15 20 25 30 35 40 45 50-50
0
50
100
150
200
250
300
350
Data table or matrix
Plot of samples(rows)
Plot of variables(columns)
12 13 45 67 89 42 35 0 0.3 0.005 111 33 5 67 90 0.06 44 33 1 2
X
Environmental data tables (two-way data)
Conc. of chemicalsPhysical PropertiesBiological propertiesOther .....
‘m’
<LOD
Environmental three-way data sets
Measured data usually consisted on concentrations of differentchemical compounds (variables) measured in different samplesat different times/situations/conditions/compartments.
Data are ordered in a two-way or in a three-way data table accordingto their structure
time/
compartm
ent3-way data sets
variables (conc. Chemical ompounds)
sam
ples
Three measurement modes- variables mode- sample mode- times/situations/conditions/ compartments mode
Models for what?Models for:1. identification of contamination sources?2. exploration of contamination sources?3. interpretation of contamination sources?4. resolution of environmental source?5. apportionment/quantitation of environmental
source?6. ??????..............................
Chemometric models to describe environmental measurements
E XY D T
N
1nijjninij eyxd
dij is the concentration of chemical contaminant j in sample in=1,...,N are a reduced number of independent environmental sourcesxin is the amount of source n in sample i;ynj is the amount of contaminant j in source n
Bilinear models for two way data:
D
J
Idij
Chemometric models to describe environmental measurements
N
D XYT
E+
J J J
I I
N
N << I or J
PCA X orthogonal, YT orthonormal
YT in the direction of maximum variance
Unique solutions
but without physical meaningIdentification and Intereprtation!
MCR-ALSX and YT non-negativeX or YT normalization
other constraints (unimodality, local rank,… )
Non-unique solutions but with physical meaning
Resolution and apportionment!
I
Bilinear models for two way data:
Chemometric models to describe environmental measurements
YT
Dk Xk
(I x J) (I,n)
YT
(n,J)
Dk
Dk Xk
(I x J) (I,n)
YT
(n,J)
PCA: orthogonality; max. variance
MCR: non-negativity, nat. constraints
Xk
Daug
Xaug
Extension of Bilinear models for simultaneous analysis of multiple two way data sets
Chemometric models to describe environmental measurements
Matrix augmentation
strategy
Environmental data sets
dijk is the concentration of chemical contaminant j in sample I at time (condition) kn=1,...,N are a reduced number of independent environmental sourcesxin is the amount of source n in sample i;ynj is the amount of contaminant j in source nznk is the contribution of source n to compartment k
Tk kD =XZ Y E
z
N
ijk in jn kn ijkn=1
d = x y +e
Chemometric models to describe environmental measurements
Trilinear models for three-way data:
k=1,...,Ki=1,
...,I
j=1,...,J
Dk
Three Way data models
X-mode D
Y-mode
Z-mode
(I , J , K)
X YZ
Ni Nj Nk
I J
K
variables
sam
ples
cond
itions
D= X
YT
Z
PARAFAC (trilinear model)
The same number of components In the three modes: Ni = Nj = Nk = N
No interactions between components
Different slices Xk are decomposed In bilinear profiles having the same shape!
Tk kD =XZ Y E
z
N
ijk in jn kn ijkn=1
d = x y +e
ji k
i j k i j k
i j k
NN N
ijk n n n in jn kn ijkn =1 n =1 n =1
d = x y z +eg
D
X
YTG
Z
=•Different number of componentsin the different modes Ni Nj Nk
•Interaction between components in different modes is possible
In PARAFAC Ni = Nj = Nk = N andcore array G is a superdiagonal identity cube
Tucker3 models
Deviations from trilinearity Mild Medium Strong Array size
PARAFAC
Small PARAFAC2
Medium TUCKER
Large MCR, PCA, SVD,..
Guidelines for method selection(resolution purposes)
Journal of Chemometrics, 2001, 15, 749-771
METHODOLOGYMETHODOLOGY
INTEGRATION OF CHEMOMETRICS
—GEOSTATISTICS
(GeographicalInformation
Systems, GIS)
Outline:
• Introduction and motivations of this work
• Environmental data tables
• Chemometrics bilinear and trilinear models and methods
• Example of application: metal contamination sources in fish, sediment and river surface water samples.
• Conclusions
141516
17
13
1211
10
9
8
7
45 6
2
3
1
1. RIU MUGA Castelló d´Empúries J0522. RIU FLUVIÀ Besalú J0223. RIU FLUVIÀ L´Armentera J0114. RIU TER Manlleu J0345. RIU TERRI Sant Julià de Ramis J0286. RIU TER Clomers J1127. RIU TORDERA Fogars de Tordera J0628. RIU CONGOST La Garriga J0379. RIU LLOBREGAT El Pont de Vilomara J03110. RIU CARDENER Castellgali J00211. RIU LLOBREGAT Abrera J08412. RIU LLOBREGAT Martorell J00513. RIU LLOBREGAT Sant Joan Despí J04914. RIU FOIX Castellet J00815. RIU FRANCOLÍ La Masó J05916. RIU EBRE Flix J05617. RIU SEGRE Térmens J207
17 rivers, 11 metals (As, Ba, Cd, Co, Cu, Cr, Fe, Mn, Ni, Pb, Zn),
3 environmental conpartments: Fish (barb’, ‘bagra comuna’, bleak, carp and
trout), Sediment and Water samples
METAL CONTAMINATION SOURCES IN SEDIMENTS, FISH AND WATERS FROM CATALONIA RIVERS USING MULTIWAY DATA ANALYSIS METHODS
Emma Peré-Trepat (UB), Mónica Flo, Montserrat Muñoz, Antoni Ginebreda (ACA), Marta Terrado, Romà Tauler (CSIC)
Mediterranean Sea
Pyrinees
Barcelona
France
Ara
gó
n
Missing data (‘m’)
• Unknown values produce empty holes in data matrices•When they are few and they are evenly distributed, theymay be estimated by PCA imputation (or other method)
Below LOD values (<LOD)
• This a common problem in environmental data tables• If most of the values are below LOD, data matrices are sparse • For calculations, it is better, either to use the experimental values or set them to LOD/2 instead of to zero or to LOD
Preliminary data description: Use of descriptive statisticsIndividual sample plots1) Individual variable plots 2) Descriptive statistics (Excel Statistics)3) Histograms/Box plots4) Binary correlation between variables5) .............................................................
**
*
**
1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950
0
50
100
150
200
250
300
Column Number
lower quartile
median
upper quartile
upper whisker
lower whisker
outliers
outliers
0 2 4 6 8 10 120
1
2
3x 10
4
1 2 3 4 5 6 7 8 9 10 11-2
0
2x 10
4
1 2 3 4 5 6 7 8 9 10 11-2
0
2
4
1 2 3 4 5 6 7 8 9 10 110
2
4
6
Effect of different data pre-treatments: Sediment samples
raw
mean-centred
auto-scaled
scaled
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn
Mo is eliminated
Data Pretreatment– No mean-centering was applied to allow an improved physical
interpretation of factors (application of non-negativity constraints instead of orthogonality constraints) and the comparison of results using MCR-ALS methods
– Two scaling possibilities:• First, data matrix augmentation and then column scaling to equal variance (each
column element divided by its standard deviation)• First, column scaling each data matrix separately and then data matrix
augmentation
– Variables with nearly no-changes and equal or close to their limit of detection were removed from scaling and divided by 20 (to avoid their miss-overweighting)
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn-1
0
1
2
3
4
5mean -+ std of scaled concentrations of 11 metals
water
sedimentsfish
metals (variables)
Description of scaled dataMetal distribution in the three compartments
Cd, Co and Ld in waterwere not scaled; only downweigthed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170.5
1
1.5
2
2.5
3
water
sediment
fish
sample sites
Foix
Ter
Ebre
Llobregat
Terri
Segre
Tordera
Francolí
Llobregat
Llobregat
MugaFluvià
FluviàTer
Congost
Llobregat
Cardener
Description of scaled data:different sites in the three compartments
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn
1 2 3 4 5 6 7 8 9 10 110
2
4
Unit variance scaled concentrations boxplot
Va
lues
1 2 3 4 5 6 7 8 9 10 110
2
4
Va
lues
1 2 3 4 5 6 7 8 9 10 110
2
4
6
Va
lues
Fish
Sediment
Water
Water
Sediment
Fish
Fish Sediment Water
0 1 2 3 4 5 6 7 8 9 10
0
5
10
15
20
25
30
35
40
45SVD odf augmented data matrices in the three-directions
svd column-wise (variables)
svd row-wise (samples)
svd trube-wise (type)
AUGMENTATION direction
column row tube
s1 40.2619 43.2553 41.3302
s2 16.7504 9.2823 19.4850
s3 9.4963 8.5312 14.3739
2nd component
How many componentsare needed to explain each mode?
contaminants
compartments
site
s
THREE-WAY DATA ARRAY MATRICIZING
or MATRIX AUGMENTATION
FishSediment
Water
Daug
Y
metals
compartments
site
s
F
S
W
F
S
W
contaminants
site
ssi
tes
site
s
1
2
3
4
5
6
MA-PCAMA-MCR-ALS
Bilinear modelling of three-way data(Matrix Augmentation or matricizing, stretching, unfolding )
Xaug
Augmenteddata matrix
Augmentedscores matrix
Loadings
I
1i
J
1j
2j,i
d
I
1i
J
1j
2j,i
e
12R
j,ij,ij,i dde
where di,j is the experimental value in the augmented data matrix for metal j and sample i
and j,id
is the corresponding calculated value using PCA or MCR-ALS bilinear models
and number of components
Explained variances using bilinear models (profiles in two modes)
N
1n j,ien,jyn,ixj,id
1 2 3 4 5 6 7 8 9 10 110
0.1
0.2
0.3
0.4
0.5
1 2 3 4 5 6 7 8 9 10 11-0.5
0
0.5
MA-PCA of scaled data without scores refolding
As Ba Cd Co Cu Cr Fe Mn Ni Pb Znmetals
%R2 (2-WAY)
1rst Compone
nt
2nd Compone
ntTotal
67.3 13.2 80.5
0
2
4
6
8
10
water samples
sediment and fishsamples
0 5 10 15 20 25 30 35 40 45 50
0 5 10 15 20 25 30 35 40 45 50-5
0
5
MA-PCA
As Ba Cu Zn
water soluble
metal ions
As Ba Cd Co Cu Cr Fe Mn Ni Pb Znmetals
1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
MA-MCR-ALS of scaled data with nn and without scores refolding
%R2 (2-WAY)
1rst Compone
nt
2nd Compone
ntTotal
48.2 42.8 80.5
67.3 13.2 80.5
MA-MCR-ALSMA-PCA
0 5 10 15 20 25 30 35 40 45 500
2
4
6
8
10
0 5 10 15 20 25 30 35 40 45 500
2
4
6
8
10
As Ba Cd Co Cu Cr Fe Mn Ni Pb Znmetals
As Ba Cd Co Cu Cr Fe Mn Ni Pb Znmetals
water samples
sediment and fishsamples
As
BaCu Zn
More easilyInterpretable!!!
0 10 20 30 40 500
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
0 10 20 30 40 500
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
Calculation of the boundaries of feasible band solutions(Journal of Chemometrics, 2001, 15, 627-646)
Nearly no rotation ambiguities are present in non-negative environmental profiles calculated by MCR-ALS
(very different to spectroscopy!!!!!)
max
min
Xaug
D
X Y
Y
contaminants
compartments
site
s
SVD
Z
FS
W
F
S
W
contaminants
site
ssi
tes
site
s
1
2
3
1 2 3 x
i
site
s
contaminants
compartments (F,S,W)
SVD4 5 6
4
5
6
xii
z
i
z
ii
x
i
xii
PCAMCR-ALS
z
iz
ii
Bilinear modelling of three-way data(Matrix Augmentation or matricizing, stretching, unfolding )
Scores refolding
strategy!!!(applied only
to final augmented
Scores)
Loadings recalculationin two modes
from augmentedscores
I
1i
J
1j
K
1k
2k,j,i
d
I
1i
J
1j
K
1k
2k,j,i
e
12R
k,j,id
k.j,id
k,j,ie
where di,j,k is the experimental value in the data cube for metal j.
sample i environmental compartment k
and k,j,id
is the corresponding calculated value using PARAFAC , Tucker3 or for PCA or MCR-ALS of augmented data matrices after recovery ogf loadings in three modes (either from scores refolding or from constraints application,
d̂ zN
i,j,k i,n j,n k,nn=1
= x yji k
i j k i j k
i j k
NN N
i,j,k n ,n ,n i,n j,n k,nn =1 n =1 n =1
d = x y zg
Explained variances using trlinear models (profiles in three modes)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
2
4
6
8
10
12
sample sites
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
2
4
6
8
10
12
sample sites
F S W0
0.4
0.8
compartments
F S W-0.8
-0.4
0
0.4
0.8
compartments
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.1
0.2
0.3
0.4
0.5
metals
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn-0.5
0
0.5
metals
%R2 (3-WAY)
1rst Component
2nd Component Total
64.7 11.7 76.4
67.3 13.2 80.5
MA-PCA + refolding MA-PCA
MA-PCA of scaled data with nn and scores refolding
Little differences in samples mode!!!
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
0.8
metals
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
0.8
metals1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0
5
10
15
sample sites
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
5
10
15
sample sites
F S W0
0.5
1
compartments
F S W0
0.5
1
compartments
MA-MCR-ALS of scaled data with scores refolding
47.0 40.7 76.9
%R2 (3-WAY)
1rst Componen
t
2nd Component Total
MA-MCR-ALS + refoldingMA-MCR-ALS
48.2 42.8 80.5
D PARAFAC
Zcontaminants
compartments
site
s
X
YF
S
W
metals
site
scom
partm
ents
(F,S
,W)
site
s
metals
com
partm
ents
(F,S
,W)
Trilinear modelling of three-way data
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
0.8
metals
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
0.8
metals
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
5
10
15
samples sites
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
5
10
15
sample sites
F S W0
0.5
1
compartments
F S W0
0.5
1
compartments
PARAFAC of scaled data
%R2 (3-WAY)
1rst Component
2nd Component Total
43.4 36.2 77.4
67.3 13.2 80.5
PARAFACMA-PCA (bilinear)
1 2 3 4 5 6 7 8
-0.2
0
0.2
0.4
0.6
0.8
1
1.2Core consistency 99.9395% (yellow target)
Core elements (green should be zero/red non-zero)
Cor
e S
ize
D
Xaug
Y
contaminants
compartments
site
s
X Y
Z
TRILINEARITY CONSTRAINT(ALS iteration step)
Selection of species profile
Folding SVDRebuilding augmented scores
Substitution ofspecies profile
F
S
W
F
S
W
contaminants
site
ssi
tes
site
s
site
s
contaminants
compartments (F,S,W)
1
2
3
1
2
3
1’
2’
3’
MCR-ALS
every augmentedscored wnated tofollow the trilinearmodel is refolded
Loadings recalculationin two modes
from augmentedscores
MA-MCR-ALSTrilinear constraint
This constraintis applied at each stepof the ALS optimization
and independently for each component
individually
MA-MCR-ALS of scaled data with nn, trilinearity (without scores refolding)
1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0 5 10 15 20 25 30 35 40 45 500
2
4
6
8
10
0 5 10 15 20 25 30 35 40 45 500
2
4
6
8
10
As Ba Cd Co Cu Cr Fe Mn Ni Pb Znmetals
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn
%R2 (2-WAY)
1rst Component
2nd Compon
entTotal
44.3 42.9 76.8
48.2 42.8 80.5
MA-MCR-ALS nn + trilinearMA-MCR-ALS nn
0 10 20 30 40 500
2
4
6
8
10
0 10 20 30 40 500
2
4
6
8
10
1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
0.8
1 2 3 4 5 6 7 8 9 10 110
0.1
0.2
0.3
0.4
0.5
Calculation of the boundaries of feasible band solutions(Journal of Chemometrics, 2001, 15, 627-646)
No rotation ambiguities are present in trilinear non-negative environmental profiles calculated by MCR-ALS
(very different to spectroscopy!!!!!)
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
metals
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
metals
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
5
10
15
sample sites
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
5
10
15
sample sites
F S W0
0.5
1
compartments
F S W0
0.5
1
compartments
MA-MCR-ALS of scaled data with nn, trilinearity and with scores refolding
%R2 (3-WAY)
1rst Componen
t
2nd Component Total
44.3 42.9 76.8
43.4 36.2 77.4
MA-MCR-ALS nn + trilinearPARAFAC nn
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
0.8
metals
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
0.8
metals
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
5
10
15
samples sites
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
5
10
15
sample sites
F S W0
0.5
1
compartments
F S W0
0.5
1
compartments
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
metals
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
metals
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
5
10
15
sample sites
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170
5
10
15
sample sites
F S W0
0.5
1
compartments
F S W0
0.5
1
compartments
Comparison PARAFAC vs MCR-ALS (trilinearity)
TUCKER3
GModel (1,2,2)
D =
metals
compartments
site
s
X
Y
Z
F
S
W
site
s
metalsmetals
site
scom
partm
ents
(F,S
,W)
12
21
22
com
partm
ents
(F,S
,W)
Tucker3 modelling of three-way data
Tucker Models with non-negativityconstraints
0 5 10 15 20 25 3064
66
68
70
72
74
76
78
80
82
84
[1 2 2] [1 2 3]
[1 3 3]
[2 2 2] [2 2 3]
[2 3 3] [3 3 3]
Explained variances (%) for each
TUCKER3 mstudied odel studied.
TUCKER3 model
Sum of Squares (%)
[1,1,1] 64.7
[1,1,2] 64.7
[1,1,3] 64.7
[1,2,1] 64.7
[1,2,2] 76.1
[1,2,3] 76.1
[1,3,1] 64.7
[1,3,2] 76.1
[1,3,3] 80.3
[2,1,1] 64.7
[2,1,2] 66.3
[2,1,3] 66.3
[2,2,1] 66.9
[2,2,2] 77.3
[2,2,3] 78.1
[2,3,1] 66.9
[2,3,2] 78.4
[2,3,3] 82.4
[3,1,1] 64.7
[3,1,2] 66.3
[3,1,3] 67.3
[3,2,1] 66.9
[3,2,2] 77.9
[3,2,3] 79.3
[3,3,1] 68.4
[3,3,2] 79.8
[3,3,3] 83.6
[3 2 3]
parsimonious model[1 2 2]
0 5 10 150
0.2
0.4
1 2 3 4 5 6 7 8 9 10110
0.5
1
1 2 30
0.5
1
1 2 3 4 5 6 7 8 9 10110
0.5
1
1 2 30
0.5
1
Tucker3 of scaled data
%R2 (3-WAY)
1rst Componen
t
2nd Component Total
50.7 35.3 76.1
43.4 36.2 77.4
TUCKER3PARAFAC
model [1 2 2]model [2 2 2]
Folding
D
=
XY
Xaug
Y
contaminants
compartments
site
s
Z
=
F
S
W
F
S
W
metals
site
ssi
tes
site
s
1
2
3
4
5
6
1 2 3 4 5 6
compartments (F,S,W)
1’
2’
3’
4’
5’
6’
=
MCR-ALS
Tucker3 CONSTRAINT(ALS iteration step)
SVD
interacting augmented scores are folded
together
Loadings recalculationin two modes
from augmentedscores
MA-MCR-ALSTucker3
constraint
This constraint is applied at each step of the ALS optimizationand independently and individually for each component i
1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
1 2 3 4 5 6 7 8 9 10 110
0.2
0.4
0.6
MA-MCR-ALS of scaled data with nn, tucker3 (without scores refolding)
0 5 10 15 20 25 30 35 40 45 500
2
4
6
8
10
0 5 10 15 20 25 30 35 40 45 500
2
4
6
8
10
%R2 (2-WAY)
1rst Component
2nd Compon
entTotal
45.2 41.4 75.8
44.3 42.9 76.8
MA-MCR-ALS nn + Tucker3MA-MCR-ALS nn + PARAFAC
model [1 2 2]model [2 2 2]
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
metals
As Ba Cd Co Cu Cr Fe Mn Ni Pb Zn0
0.2
0.4
0.6
metals1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
0
5
10
15
sample sitesF S W
0
0.5
1
compartments
F S W0
0.5
1
compartments
MA-MCR-ALS of scaled data with nn, tucker3 and with scores refolding
%R2 (3-WAY)
1rst Componen
t
2nd Component Total
45.2 41.4 75.8
50.7 35.3 76.1
MA-MCR-ALS nn + Tucker3Tucker3
model [1 2 2]model [1 2 2]
CHEMOMETRIC METHOD%R2 (3-WAY) %R2 (2-WAY)
1rst Component
2nd Component
Total1rst
Component2nd
ComponentTotal
MA-PCA (scale) 64.7 11.7 76.4 67.3 13.2 80.5
PARAFAC (non-negativity) 43.4 36.2 77.4 - - -
TUCKER3 (non-negativity) 50.7 35.3 76.1 - - -
MA-MCR-ALS (non-negativity) 47.0 40.7 76.9 48.2 42.8 80.5
MA-MCR-ALS (non-negativity and triliniarity)
44.3 42.9 76.8 - - -
MA-MCR-ALS (non-negativity and tucker restrictions)
45.2 41.4 75.8 - - -
Summary of Results
J052 Riu Muga Castelló d'Empúries
J022 Riu Fluvià Besalú
J011 Riu Fluvià Armentera, l'
J034 Riu Ter Manlleu
J028 Riu Terri Sant Julià de Ramis
J112 Riu Ter Colomers
J062 Riu Tordera Fogars de Tordera
J037 Riu Congost Garriga, la
J031 Riu Llobregat Pont de Vilomara i Rocafort, el
J002 Riu Cardener Castellgalí
J084 Riu Llobregat Abrera
J005 Riu Llobregat Martorell
J049 Riu Llobregat Sant Joan Despí
J008 Riu Foix Castellet i la Gornal
J059 Riu Francolí Masó, la
J056 Riu Ebre Flix
E207 Riu Segre Térmens
(67.3%)(13.2%)
INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information
Systems, GIS)
J052 Riu Muga Castelló d'Empúries
J022 Riu Fluvià Besalú
J011 Riu Fluvià Armentera, l'
J034 Riu Ter Manlleu
J028 Riu Terri Sant Julià de Ramis
J112 Riu Ter Colomers
J062 Riu Tordera Fogars de Tordera
J037 Riu Congost Garriga, la
J031 Riu Llobregat Pont de Vilomara i Rocafort, el
J002 Riu Cardener Castellgalí
J084 Riu Llobregat Abrera
J005 Riu Llobregat Martorell
J049 Riu Llobregat Sant Joan Despí
J008 Riu Foix Castellet i la Gornal
J059 Riu Francolí Masó, la
J056 Riu Ebre Flix
E207 Riu Segre Térmens
(67.3%)(13.2%)
INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information
Systems, GIS)
J052 Riu Muga Castelló d'Empúries
J022 Riu Fluvià Besalú
J011 Riu Fluvià Armentera, l'
J034 Riu Ter Manlleu
J028 Riu Terri Sant Julià de Ramis
J112 Riu Ter Colomers
J062 Riu Tordera Fogars de Tordera
J037 Riu Congost Garriga, la
J031 Riu Llobregat Pont de Vilomara i Rocafort, el
J002 Riu Cardener Castellgalí
J084 Riu Llobregat Abrera
J005 Riu Llobregat Martorell
J049 Riu Llobregat Sant Joan Despí
J008 Riu Foix Castellet i la Gornal
J059 Riu Francolí Masó, la
J056 Riu Ebre Flix
E207 Riu Segre Térmens
(67.3%)(13.2%)
INTEGRATION OF CHEMOMETRICS-GEOSTATISTICS (Geographical Information
Systems, GIS)
Outline:
• Introduction and motivations of this work
• Environmental data tables
• Chemometrics bilinear and trilinear models and methods
• Example of application: metal contamination sources in fish, sediment and river surface water samples.
• Conclusions
Conclusions
Chemometric methods allow resolution of environemtal sources of chemical contaminants
However we should we aware of how every method displays the information because the mathematical properties of the used method are different (i.e. orthogonality vs non-negativity, bilinearity vs trilinearity, nr. of components...)
This interpretation and resolution of environmental sources is not easy because the contamination sources in real world are correlated and because of experimental data limitations (environmental sources should show variation in the investigated data set).
Bilinear PCA and MCR-ALS can be used to study multiway data sets and compared with multiway methods (like PARAFAC and Tucker if appropriate scores refolding is performed)
Bilinear non-negative MCR-ALS solutions may provide good approximation of the real sources because non-negative environmental profiles have little rotation ambiguity
Conclusions
PARAFAC and Tucker3 may provide simpler models and they are special useful for trilinear data or when not the same number of components are present in the different modes.
Intermediate situations between pure bilinear and pure trilinear models can be easily implemented in MCR-ALS
Bilinear based models are more flexible than trilinear based models to resolve ‘true’ sources of data variation
Different number of components and interactions between components in different modes (constraint under development) can be considered in mixed bilinear-trilinear-Tucker MA-MCR models
For an optimal RESOLUTION, the model should be in accordance with the 'true' data structure
Integration of Chemometrics-GIS results may facilitate geographical and temporal interpretation of contamination sources and they correlation with land uses, population and industrial activities
Acknowledgements
• Water Catalan Agency is acknowledge for its financial support and for providing experimental data sets
• Research grant Project MCYT, Nr. BQU2003-00191, Spain