Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Comparison of independent process
analytical measurements — a variographic
study
Pentti Minkkinen1) Lappeenranta University of Technology
2) Aalborg University Campus Esbjerg
E-mail: [email protected] @ Pentti Minkkinen
WSC 7, Raivola, Russia, 15-19 February, 2010
Wastewater
Atmospheric emissions
Solid wastes
Outline
• Process analytical chemistry – measure-ment systems
• Why comparisons are needed
– Calibration
– Performance tests of measurement systems
– Comparison of process means
• Data analysis methods
– Incorrect
– Correct
Process Analytical Chemistry
At-line analysis On-line analysisOff-line analysis
Non-invasive analysisIn-line analysis
Student’s t-tests
• Most widely used statistical test in testing the equality of the mean values of two measurement sets
• Basic assumptions:
– Sets independent
– Normal distribution (sensitive to non-normality)
• Assumptions seldom met in process analysis
t-tests for comparing 2 means
1ii n
Two measurement sets:
1111 ,,,,1 nsxx
2222 ,,,,2 nsxx
ni = number of measurements
= degrees of freedom
12 xxd = difference of the two mean
QUESTION: Is d significantly different from zero?
21
22
s
sF
2121
222
211 ,
sss
A) Standard deviations of both sets can be assumed to be equal
22
2
2
1s , confirmed with F-test
If this is not significant the standard deviations can be
pooled.
(1)
(2)
2
2
1
2
ns
ns
ds
The standard deviation of the difference with degrees of
freedom, is
(3)
dstdci ,2
a) If the confidence interval, ci, does not include zero
(4)
b) t-test
ds
dt
0(5)
Two ways to detect if d is different from zero
B) Standard deviations not equal
The standard deviation of the difference d is calculated as
222
42
121
41
4
*
n
s
n
s
ds
The degrees of freedom have to be estimated by using
Satterthwaite’s formula
(7)
Significance estimated either by using Eqs. 4 or 5
2
22
1
21*
n
s
n
s
ds
2
2
2
1ss
(6)
(F-test significant)
1,,,,1 nnsd dx2xd
C) Parallel determinations on same
samples
ns
dt
d
0
or on t-test
(9)
(8)n
stdci d
,2
Conclusion on significance can be based on confidence interval
SD of the differences
Pitfalls• Tests A) and B) cannot be used to test
differences of analytical methods, if tests are carried out on different samples (between-samples variance will mask the analytical variance)
• Test C) eliminates the between-samples variance. However, if the analytical variance is dependent on concentration di’s are not normally distributed
• All tests fail, or are inefficient in multivariate case, if correlated variables are tested one-at-time
• Autocorrelation is a problem in all
Estimation of the variance
(uncertainty) of the mean of a data
set from a dynamic (elongated) 1-D
data set
Data sets are autocorrelated, i.e., samples taken
within short intervals have values which differ less
than samples taken far apart assumption on
normality does not hold
PSE depends on sample selection strategy, if consecutive
values are autocorrelated. Selection options:
random
stratified random
stratified systematic.
Error made in estimating the mean of a continuous object from
discrete samples is called Point Selection Error, PSE.
PSE is the error of the mean of a continuous lot estimated by using
discrete samples.
Point selection error has two components: PSE = PSE1 + PSE2
PSE1 ... error component caused by random drift
PSE2 ... error component caused by cyclic drift
Statistics of correlated series is needed to evaluate the sampling
variance of mean of the results .
100
0 5 10 15 20 25 300
50C
ON
CE
NT
RA
TIO
N
TIME
Random selection
0 5 10 15 20 25 300
50
100
CO
NC
EN
TR
AT
ION
TIME
Stratified selection
0 5 10 15 20 25 300
50
100
TIME
CO
NC
EN
TR
AT
ION
Systematic selection
Sample selection modes
When sampling autocorrelated series the same number
of samples gives different uncertainties for the mean
depending on selection strategy
Stratified sampling:
n
ss str
x
Systematic sampling:
n
ss
sys
x
sstr and ssys are
standard deviation
estimates where the
autocorrelation has
been taken into
account.Normally sp > sstr > ssys ,
except in periodic processes, where ssys may be the
largest
Random sampling:
n
ss
p
x sp is the process
standard deviation
Systematic sampling from periodic
process
0 50 100 150 200 250 300 350 400 450 5000
5
ai
TIME
aL = 0
asample= 0.996
0 50 100 150 200 250 300 350 400 450 5000
5
ai
TIME
aL = 0
asample= 0.689
If too low sampling frequency is used in sampling periodic
processes there is always a danger that the mean is biased
Estimation of PSE by variography
Relative heterogeneity
of the process:
Ni ,,2,1 M
M
a
aah is
L
Lii
,
Mean of the process:is
iis
M
aM
La
Variogaphic experiment: N samples collected at equal
distances and analyzed, are analytical results,
sample sizes and mean sample size, respectively.
MMaisi ,,
Absolute heterogeneity
of the process:
Ni ,,2,1 M
Maah is
Lii,
(10)
(11)
(12)
Variogram of heterogeneity as function of sampling interval j :
jN
i
ijij hhjN
V1
2
2
1
2,,2,1
Nj ,
To estimate variances the variogram has to be integrated
(numerically in Gy’s method).
Analysis of variogram provides variance estimates for
estimating the mean of the data set obtained by random,
stratified or systematic sample selection mode
(13)
n
js
Lsystra
a)(2
,,)var( (14)
0 50 100-1
0
1
a i
0 20 40 600
0.1
0.2
V
0 50 100-2
0
2
a i
0 20 40 600
0.5
1
1.5
V
0 50 1005
10
15
a i
0 20 40 600
2
4
V
Sample # Sample lag, j
Random
Periodic
Non-periodic
drift
PROCESS VARIOGRAM
VARIOGRAMS FOR THREE DIFFERENT BASIC PROCESS TYPES
Sample #0 20 40 60 80 100
4
6
8
10
12
ai
Sample lag, j
0 10 20 30 40 500
1
2
3
4
5
V
Data and variogram of a complex process
Interpretation of the variogram
SAMPLE LAG, j
V
V0
Random effects (sampling, preparation, analysis)
SillRange
Vp
V(drift)
2 V(cyclic)
Examples
• Simultaneous emission measurements by
two different process analyzers/teams
from a power plant
– NOx emission
– O2 in stack
NOx emission: Control measurements
0 10 20 30 40 50150
155
160
165
170
175
180
185
190
195
TIME (h)
CO
NC
EN
TR
AT
ION
(p
pm
)
NOx1
Control
= 182.4
s1 = 7.11x
Routine
= 172.1
s2 = 8.32x
d = 10.3
t-test significant
Results of control vs. routine measurements, two process analyzers
150 160 170 180 190150
155
160
165
170
175
180
185
190
195NOx1
ROUTINE
CO
NT
RO
L
Routine vs. control measurement of NOx emissions
Variograms of the routine and control measurement sets
Control
Routine
0 5 10 15 20 250
20
40
60
80
100NOx1, Variograms
LAG (j)
VA
RIO
GR
AM
S
Oxygen
0 10 20 30 406
6.2
6.4
6.6
6.8
7
7.2
7.4
TIME (h)
CO
NC
EN
TR
AT
ION
(%
O2
) Control
Routine
Results of control vs. routine measurements:
parallel O2 measurements using two different process analyzers
t=0.224
not significant
Control
= 6.77
s1 = 0.1911x
Routine
= 6.78
s2 = 0.2532x
6 6.2 6.4 6.6 6.8 7 7.2 7.46
6.2
6.4
6.6
6.8
7
7.2
7.4O2
ROUTINE (%O2)
CO
NT
RO
L (
%O
2)
Control vs. routine measurements, results of linear regression analysis:
Intercept = -0.3004 (95 % ci = -1.76 ... 1.16)
Slope = 1.046 (95 % ci = 0.83 1.26)
0 10 20 30 40-0.3
-0.2
-0.1
0
0.1
0.2
SAMPLE #
CO
NT
RO
L-R
OU
TIN
E (
%O
2)
Differences of the control - routine measurements
t=0.224
not significant
= 0.0077
sd = 0.125
d
0 5 10 15 200
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09O2, Variograms
LAG (h)
VA
RIO
GR
AM
S
Routine
Control
Variograms of the routine and control measurement sets
0 5 10 15 200
0.05
0.1
0.15
0.2
O2, Standard Deviations
LAG (h)
s (
% O
2)
Routine
Control
Standard deviations of systematic sampling mode estimated
from the variograms of the routine and control measurement
sets
0 5 10 15 200.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
0.022
LAG (h)
V(d
)
Variogram of the difference d, control-routine measurement of O2
Comparison of two different
process means
Does process change affect the
result (or behavior of the process)?
0 200 400 600 800140
160
180
200
220
240
260
280
TIME (h)
CO
NC
. m
g/m
3
NOx emissions from a power plant within two different time
periods, 745 and 738 data points
Period 1
= 214.8
s1 = 15.41x
Period 2
= 216.5
s2 = 23.22x
t = 1.68
Not significant
d =1.7
0 100 200 300 4000
100
200
300
400
500
600
700
LAG (h)
V (
mg
/m3)2
Variograms of the of the data sets from two different time spans
0 20 40 60 80 1004
6
8
10
12
14
16
18
LAG (h)
s (
mg
/m3)
Standard deviations of systematic sampling mode estimated
from the variograms of the two time spans
t-test taking the autocorrelation into
account
t3 = 5.78 SIGNIFICANT
745,95.5,8.214 111 nsx
738,49.5,5.216 222 nsx
CONCLUSIONS
• Before selecting the statistical tools and
drawing inferences try to plot the data so
that it shows the desired phenomenon
• Variographic analysis of time series is a
powerful tool. It separates random effects,
non-periodic and periodic drift
• In multivariate case variographic analysis
can be carried out, e.g., on PCA scores
Спасибо
.
THANK YOU, tack, kiitos, danke, merci, obrigado,
gracias, grazie, tesekkur ederim, sukran,..
Graduate courses at Lappeenranta
University of Technology (in
English)
• Experimental Design, 25-26 March, 2010
• Sampling for Chemical Analysis,7-9 April,
2010