Comparison of independent process analytical measurements — a

Preview:

Citation preview

Comparison of independent process

analytical measurements — a variographic

study

Pentti Minkkinen1) Lappeenranta University of Technology

2) Aalborg University Campus Esbjerg

E-mail: Pentti.Minkkinen@lut.fiCopyright @ Pentti Minkkinen

WSC 7, Raivola, Russia, 15-19 February, 2010

Wastewater

Atmospheric emissions

Solid wastes

Outline

• Process analytical chemistry – measure-ment systems

• Why comparisons are needed

– Calibration

– Performance tests of measurement systems

– Comparison of process means

• Data analysis methods

– Incorrect

– Correct

Process Analytical Chemistry

At-line analysis On-line analysisOff-line analysis

Non-invasive analysisIn-line analysis

Student’s t-tests

• Most widely used statistical test in testing the equality of the mean values of two measurement sets

• Basic assumptions:

– Sets independent

– Normal distribution (sensitive to non-normality)

• Assumptions seldom met in process analysis

t-tests for comparing 2 means

1ii n

Two measurement sets:

1111 ,,,,1 nsxx

2222 ,,,,2 nsxx

ni = number of measurements

= degrees of freedom

12 xxd = difference of the two mean

QUESTION: Is d significantly different from zero?

21

22

s

sF

2121

222

211 ,

sss

A) Standard deviations of both sets can be assumed to be equal

22

2

2

1s , confirmed with F-test

If this is not significant the standard deviations can be

pooled.

(1)

(2)

2

2

1

2

ns

ns

ds

The standard deviation of the difference with degrees of

freedom, is

(3)

dstdci ,2

a) If the confidence interval, ci, does not include zero

(4)

b) t-test

ds

dt

0(5)

Two ways to detect if d is different from zero

B) Standard deviations not equal

The standard deviation of the difference d is calculated as

222

42

121

41

4

*

n

s

n

s

ds

The degrees of freedom have to be estimated by using

Satterthwaite’s formula

(7)

Significance estimated either by using Eqs. 4 or 5

2

22

1

21*

n

s

n

s

ds

2

2

2

1ss

(6)

(F-test significant)

1,,,,1 nnsd dx2xd

C) Parallel determinations on same

samples

ns

dt

d

0

or on t-test

(9)

(8)n

stdci d

,2

Conclusion on significance can be based on confidence interval

SD of the differences

Pitfalls• Tests A) and B) cannot be used to test

differences of analytical methods, if tests are carried out on different samples (between-samples variance will mask the analytical variance)

• Test C) eliminates the between-samples variance. However, if the analytical variance is dependent on concentration di’s are not normally distributed

• All tests fail, or are inefficient in multivariate case, if correlated variables are tested one-at-time

• Autocorrelation is a problem in all

Estimation of the variance

(uncertainty) of the mean of a data

set from a dynamic (elongated) 1-D

data set

Data sets are autocorrelated, i.e., samples taken

within short intervals have values which differ less

than samples taken far apart assumption on

normality does not hold

PSE depends on sample selection strategy, if consecutive

values are autocorrelated. Selection options:

random

stratified random

stratified systematic.

Error made in estimating the mean of a continuous object from

discrete samples is called Point Selection Error, PSE.

PSE is the error of the mean of a continuous lot estimated by using

discrete samples.

Point selection error has two components: PSE = PSE1 + PSE2

PSE1 ... error component caused by random drift

PSE2 ... error component caused by cyclic drift

Statistics of correlated series is needed to evaluate the sampling

variance of mean of the results .

100

0 5 10 15 20 25 300

50C

ON

CE

NT

RA

TIO

N

TIME

Random selection

0 5 10 15 20 25 300

50

100

CO

NC

EN

TR

AT

ION

TIME

Stratified selection

0 5 10 15 20 25 300

50

100

TIME

CO

NC

EN

TR

AT

ION

Systematic selection

Sample selection modes

When sampling autocorrelated series the same number

of samples gives different uncertainties for the mean

depending on selection strategy

Stratified sampling:

n

ss str

x

Systematic sampling:

n

ss

sys

x

sstr and ssys are

standard deviation

estimates where the

autocorrelation has

been taken into

account.Normally sp > sstr > ssys ,

except in periodic processes, where ssys may be the

largest

Random sampling:

n

ss

p

x sp is the process

standard deviation

Systematic sampling from periodic

process

0 50 100 150 200 250 300 350 400 450 5000

5

ai

TIME

aL = 0

asample= 0.996

0 50 100 150 200 250 300 350 400 450 5000

5

ai

TIME

aL = 0

asample= 0.689

If too low sampling frequency is used in sampling periodic

processes there is always a danger that the mean is biased

Estimation of PSE by variography

Relative heterogeneity

of the process:

Ni ,,2,1 M

M

a

aah is

L

Lii

,

Mean of the process:is

iis

M

aM

La

Variogaphic experiment: N samples collected at equal

distances and analyzed, are analytical results,

sample sizes and mean sample size, respectively.

MMaisi ,,

Absolute heterogeneity

of the process:

Ni ,,2,1 M

Maah is

Lii,

(10)

(11)

(12)

Variogram of heterogeneity as function of sampling interval j :

jN

i

ijij hhjN

V1

2

2

1

2,,2,1

Nj ,

To estimate variances the variogram has to be integrated

(numerically in Gy’s method).

Analysis of variogram provides variance estimates for

estimating the mean of the data set obtained by random,

stratified or systematic sample selection mode

(13)

n

js

Lsystra

a)(2

,,)var( (14)

0 50 100-1

0

1

a i

0 20 40 600

0.1

0.2

V

0 50 100-2

0

2

a i

0 20 40 600

0.5

1

1.5

V

0 50 1005

10

15

a i

0 20 40 600

2

4

V

Sample # Sample lag, j

Random

Periodic

Non-periodic

drift

PROCESS VARIOGRAM

VARIOGRAMS FOR THREE DIFFERENT BASIC PROCESS TYPES

Sample #0 20 40 60 80 100

4

6

8

10

12

ai

Sample lag, j

0 10 20 30 40 500

1

2

3

4

5

V

Data and variogram of a complex process

Interpretation of the variogram

SAMPLE LAG, j

V

V0

Random effects (sampling, preparation, analysis)

SillRange

Vp

V(drift)

2 V(cyclic)

Examples

• Simultaneous emission measurements by

two different process analyzers/teams

from a power plant

– NOx emission

– O2 in stack

NOx emission: Control measurements

0 10 20 30 40 50150

155

160

165

170

175

180

185

190

195

TIME (h)

CO

NC

EN

TR

AT

ION

(p

pm

)

NOx1

Control

= 182.4

s1 = 7.11x

Routine

= 172.1

s2 = 8.32x

d = 10.3

t-test significant

Results of control vs. routine measurements, two process analyzers

150 160 170 180 190150

155

160

165

170

175

180

185

190

195NOx1

ROUTINE

CO

NT

RO

L

Routine vs. control measurement of NOx emissions

Variograms of the routine and control measurement sets

Control

Routine

0 5 10 15 20 250

20

40

60

80

100NOx1, Variograms

LAG (j)

VA

RIO

GR

AM

S

Oxygen

0 10 20 30 406

6.2

6.4

6.6

6.8

7

7.2

7.4

TIME (h)

CO

NC

EN

TR

AT

ION

(%

O2

) Control

Routine

Results of control vs. routine measurements:

parallel O2 measurements using two different process analyzers

t=0.224

not significant

Control

= 6.77

s1 = 0.1911x

Routine

= 6.78

s2 = 0.2532x

6 6.2 6.4 6.6 6.8 7 7.2 7.46

6.2

6.4

6.6

6.8

7

7.2

7.4O2

ROUTINE (%O2)

CO

NT

RO

L (

%O

2)

Control vs. routine measurements, results of linear regression analysis:

Intercept = -0.3004 (95 % ci = -1.76 ... 1.16)

Slope = 1.046 (95 % ci = 0.83 1.26)

0 10 20 30 40-0.3

-0.2

-0.1

0

0.1

0.2

SAMPLE #

CO

NT

RO

L-R

OU

TIN

E (

%O

2)

Differences of the control - routine measurements

t=0.224

not significant

= 0.0077

sd = 0.125

d

0 5 10 15 200

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09O2, Variograms

LAG (h)

VA

RIO

GR

AM

S

Routine

Control

Variograms of the routine and control measurement sets

0 5 10 15 200

0.05

0.1

0.15

0.2

O2, Standard Deviations

LAG (h)

s (

% O

2)

Routine

Control

Standard deviations of systematic sampling mode estimated

from the variograms of the routine and control measurement

sets

0 5 10 15 200.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

0.022

LAG (h)

V(d

)

Variogram of the difference d, control-routine measurement of O2

Comparison of two different

process means

Does process change affect the

result (or behavior of the process)?

0 200 400 600 800140

160

180

200

220

240

260

280

TIME (h)

CO

NC

. m

g/m

3

NOx emissions from a power plant within two different time

periods, 745 and 738 data points

Period 1

= 214.8

s1 = 15.41x

Period 2

= 216.5

s2 = 23.22x

t = 1.68

Not significant

d =1.7

0 100 200 300 4000

100

200

300

400

500

600

700

LAG (h)

V (

mg

/m3)2

Variograms of the of the data sets from two different time spans

0 20 40 60 80 1004

6

8

10

12

14

16

18

LAG (h)

s (

mg

/m3)

Standard deviations of systematic sampling mode estimated

from the variograms of the two time spans

t-test taking the autocorrelation into

account

t3 = 5.78 SIGNIFICANT

745,95.5,8.214 111 nsx

738,49.5,5.216 222 nsx

CONCLUSIONS

• Before selecting the statistical tools and

drawing inferences try to plot the data so

that it shows the desired phenomenon

• Variographic analysis of time series is a

powerful tool. It separates random effects,

non-periodic and periodic drift

• In multivariate case variographic analysis

can be carried out, e.g., on PCA scores

Спасибо

.

THANK YOU, tack, kiitos, danke, merci, obrigado,

gracias, grazie, tesekkur ederim, sukran,..

Graduate courses at Lappeenranta

University of Technology (in

English)

• Experimental Design, 25-26 March, 2010

• Sampling for Chemical Analysis,7-9 April,

2010

Recommended