Upload
costel
View
212
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [University of Saskatchewan Library]On: 19 November 2014, At: 16:39Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH,UK
Analytical LettersPublication details, including instructions forauthors and subscription information:http://www.tandfonline.com/loi/lanl20
Informational Analysis ofVariance Applied to Method-Comparison. A ComparativeStudyCostel S[acaron]rbu aa Department of Analytical Chemistry , “Babeş-Bolyai” University , RO-3400, Cluj-Napoca,RoumaniaPublished online: 17 Aug 2006.
To cite this article: Costel S[acaron]rbu (1997) Informational Analysis of VarianceApplied to Method-Comparison. A Comparative Study, Analytical Letters, 30:5,1051-1063, DOI: 10.1080/00032719708002317
To link to this article: http://dx.doi.org/10.1080/00032719708002317
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all theinformation (the “Content”) contained in the publications on our platform.However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness,or suitability for any purpose of the Content. Any opinions and viewsexpressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of theContent should not be relied upon and should be independently verified withprimary sources of information. Taylor and Francis shall not be liable for anylosses, actions, claims, proceedings, demands, costs, expenses, damages,and other liabilities whatsoever or howsoever caused arising directly orindirectly in connection with, in relation to or arising out of the use of theContent.
This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden. Terms & Conditions of access and use can be found athttp://www.tandfonline.com/page/terms-and-conditions
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
ANALYTICAL LETTERS, 30(5), 1051-1063 (1997)
INFORMATIONAL ANALYSIS OF VARIANCE APPLIED TO
METHOD-COMPARISON. A COMPARATIVE STUDY
Key words: analysis of variance, informational energy, statistical tests, method-
comparison
Costel S r b u
Department of Analytical Chemistry, "Babe$-Bolyai" University,
RO-3400 Cluj-Napoca, Romania
ABSTRACT
The results obtained in the determination of mercury in solid wastes by AAS using two preparation methods' ase compared through statistical parametric and
non-paramehc tests, linear regression and informational analysis of variance. The
informational analysis of variance (LANOVA) method is a distribution-free
procedure valid under minimal assumptions.It is not duenced by the range of the
data and has very satisfactory robustness properties. Applying this algorithm to
compare the effeciiveness of the traditional water-bath digestion method with the
microwave hes t ion method discussed in ref. l, it was possible to assess the
proportional errors introduced by microwave digestion method concerning the
analysis of mercwy in waste samples.
1051
Copyright 0 1997 by Marcel Dekker. Inc.
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
1052 SARBU
INTRODUCTION
The informational energy (IE) concept and its detailed theoretical study as
well as its implications in the field of mathematics, called "informational statistics",
was introduced by Onicescu' and Onicescu and $tefinescu3.
It is well known by chemometricians from information theory that the
Shannon entropy may be calculated considering the probability, pi, associated to
each of the states of a system%
Orucescu observed that H(A) is the mean value of the logarithm to base 2 of
all probabilities and he addressed the question of whether the mean value of
probabilities
could not be a function "with similar characteristics of representation like Shannon's
entropy". Onicescu named it informational energy (IE).
Accordmg to Mihoc's considerations7 concerning the estimation of IE, if the
probabilities pi (i=1,2, ..., n) of a finite set of states are estimated by the relative
ii-equencies, $ of a real experiment, then the empirical IE may be calculated with
the following expression:
Equations 2 and 3 give information concerning the degree of organization of a
system or the mode of partition of its elements. Defined in this way, IE reveals
some remarkable properties. First, it reaches its minimum value when all the
probabilities are equal (ply2= ...=pn), i.e., the case of totally unorganized systems:
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
ANALYSIS OF VARIANCE 1053
'@I.,, ....,p n) = 1''
[email protected],.....pn) = 1 ( 5 )
(4) If pk = 1 and pirk = 0, i.e., the case with well organized systems, then 1E is
Hence the possible values for IE are between I/n and 1.
IE describes with the same success as Shannon's entropy the uniformity or
diversity of a system, process or phenomenon. It should. however. be remembered
that H(A) is a logarithrmc quantity, so that IE appears to be more sensitive in a
certain way than the entropy to modifications of the system. Moreover. this
informational function permits the calculation of some parameters of interest in
analyhcal chemistry such as the mformational correlation (IC) and the informational
correlation coefficient (ICC)~-"'.
INFORMATIONAL ANALYSIS OF VARIANCE
The objective in an analysis of variance (ANOVA) is to isolate and assess
sources of variation associated with independent experimental variables and to
determine how these variables interact and affect the response.
Usually, in analyhcal chemistry, the majority of ANOVA methods are used
to investigate the significance of the difference between the overall mean of q
subpopulations and an assumed value po for the population mean. Two different
null hypothesis are tested; the first being that q subpopulations have the same mean
p, = p2 = ... = b, where >O (0 I i I q) are means of q statistical populations.
The second is that the overall mean is equal to the assumed value p=po. The data
in tlus case will be arranged so that each column represents a different level of the
factor being tested.
This homogeneity concerning the means may be tested also using the
informational energy concept"-' l .
The one-way layout
Suppose some factor A, which we consider as having some effect on a
response variable of interest y, has q levels. An experiment is set up in which n
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
1054 SARBU
measurements are made of the response y at all levels. The levels q are called
treatments or controlled factors, there being q controlled factors in the experimental
design. Each yij result can be written as a sum of a constant p (the general mean),
ai, a term which measures the effect of the factor A at the ith level, and an error
term eij, called the residual error or residual. The linear (or additive) model
y.. 'J = p + ai + eij (6)
can be written for the one-way layout. It is necessary now to test the null hypothesis
H,: pL1 = p2 = ... = p 9' Let 5 be a new random variable with q levels each having an associated
probability pi:
Now, it is possible to observe that the null hypothesis H, is equivalent to the
hypothesis H*: p, = pz = ... = pq = l/q. The H* hypothesis is true when E(6) = l/q,
i.e., when the informational energy of the random variable 6 is minimal.
If we define pi as
and substitute Eqn. 5 into Eqn. 4 and after that Eqn.4 in Eqn.2, the empirical
informational energy of random variable 4 is given by the following expression:
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
ANALYSIS OF VARIANCE 1055
As the numerator of the expression for .&Y([) contains a sum of squares of the random
variables, it is possible using the theorems of classical repartition to construct a
criterion for testing the hypothesis H*: E([) = l/q12
If E") = .&Y((5) the null hypothesis is accepted, and on the other hand, if I&)
z .&Y"ta the null hypothesis is rejected, hence the effect of factor A is taken as
significant.
The two-way layout
Let us cansider the case in which an experiment must be set up to study the
effects of two factors A and B on a response varable y. Factor A has q levels
whereas factor B has m levels. For each combination of levels, we measure the
response yi by carrying out n observations. In cases with no replications and if we
assume that there is no interaction between the two factors, one may adopt a linear
model:
y . U = p + ai + p. J + % (10)
The hypothesis H, (ai=O), i.e., the factor A has no significant effect, is equivalent
to the hypothesis t
9 H : p l = p z = . . .=p
his is equivalent to H ~ * : E ~ ~ ) = l/q.
The estimated informational energy, .&Y(Ey concerning the probabilities pi is
given by Eqn. 6. The null hypothesis is then accepted when E([) = iT(E)l hence all ai
values are equal to zero; the effect of factor A is not significant.
The hypothesis pj = 0 (i = 1,2,--.,m), i.e., the factor has no S i d c a n t effect,
is equivalent to the hypothesis * I 1 H : p l = p 2 = . . .=pm
where
I y., Pj = 7
c Y., j = l
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
1056 sAmu
and which is equivalent to the hypothesis
H,*: E ( ~ ) = l/m.
The estimated informational energy concerning the probabilities pj' is given by
m
If E",) z ift9) the null hypothesis is rejected, hence the effect of factor B is
sigruficant .
RESULTS AND DISCUSSION
To illustrate the p o t d of the informational analysis of variance presented
above we refer to the data discussed in ref. 1 concerning the effectiveness of
traditional water-bath digestion used in U.S.EPA method 7471 and microwave
digestion method 305 1.
The results obtained in the determination of mercury(ppm) in solid wastes
by AAS using the two preparation sample methods (see Table 1) are compared
through statistical parametric and non-parametric tests, informational analysis of
variance and Linear regression.
Paired t-Test
The t-)est6.13 is particularly suitable for the statistical treatment of samples
of &gMy varying composition. The t value is evaluated through the parameter Di
calculated as the difference between the results obtained with the two methods for
the same sample with regard to the sign, and the mean D of all the individual Di
differences:
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
ANALYSIS OF VARIANCE 1057
Table 1. Determination of mercury @pm) in solid wastes by A A S using two
standard methods of sample preparation discussed in'.
Sample Method 305 1 Method 747 1
a b c Mean a b c Mean Di
1 7.12 7.66 7.17 7.32 5.50 5.54 5.40 5.48 +1.84
2 16.1 15.7 15.6 15.8 13.1 12.8 13.0 13.0 +2.80
3 4.89 4.62 4.28 4.60 3.39 3.12 3.36 3.29 +1.31
4 9.64 9.03 8.44 9.04 6.59 6.52 7.43 6.84 +2.20
5 6.76 7.22 7.50 7.16 6.20 6.03 5.77 6.00 +1.16
6 6.19 6.61 7.61 6.80 6.25 5.65 5.61 5.84 +0.96
7 9.44 9.56 10.7 9.90 15.0 13.9 14.0 14.3 -4.40
8 30.8 29.0 26.2 28.7 20.4 16.1 20.0 18.8 +9.90
where
For the pairs of means in Table 1, a t value of 1.43 1 was calculated. The tabulated
value of the t-distributionat the the 95% confidence level and 7 degrees of freedom
is t = 2356. It can be concluded that at this confidence level there is no significant
difference between the two methods of sample preparation. This was the conclusion
expressed in ref. 1.
Wilcoxon Matched-pair Signed-rank Test
All the Di values calculated (with regard to the sign) as the difference
between the means obtained with the two methods for the same sample are first
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
1058 SARBU
ranked without regard to the sign, starting with the smallest value. Then the sign of
D, is considered. The null hypothesis of equivalence of the methods is taken
according to which the sum (T+) of all the ranks for the positive D, is close to the
sum for the negative D,(T-). The smaller the value of
T = min (T+,T-)
the larger the significance of the diference6, The values of T calculated fi-om the
data in Table 1 are TC = 29 and T = 7 . The tabulated value for T (nl = n, = 8) at
the 95% confidence level is 4. This value is lower than Tndn = 7. and it can be
concluded that also according to this test. there is no significant difference between
the two methods.
Mann-Whitney U-Test
The Mann-Whitney U-test"13 is a non parametric test for the comparison
of methods 1 and 2, through n I and n ,measurements performed with the two
methods. All the data are ranked by assigning rank 1 to the lowest, rank 2 to the
second and so on. For the two series of data the sums of the ranks R , and R, and
the parameters U , and U, are calculated:
The smaller of the two calculated U values is compared with the value tabulated for
the U distribution. The calculated values are U I = 24 and U, = 40. As the tabulated
value for n, = n2 = 8 at the 95% confidence level. U = 13, is lower than the
calculated U value, U I = 24, it is concluded that also according to this test there is
no statistically significant difference between the results of the two methods.
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
ANALYSIS OF VARIANCE 1059
Informational analysis of variance
The null hypothesis in this case is equivalent to the hypothesis
H*: p1 = pz = ... = ps.
The probabilities p are calculated with Eqn. 6. This is equivalent to the hypothsis
H*: E(g) = I/q or E(g) = 1/8 = 0.125.
The empirical informational energy associated with the probabilities pi is
given by
8
2 = 0.228 (17) 603.6849 - ~r~
As E(g) + &F([,, the difference between the two methods of sample preparation is
sigruScant and it is concluded that there is an important method effect, which means
that one method shows a bias. This contradictory result in comparison with the
other parametric and non-parametric test is confirmed by ordinary linear regression
and principal components regression, respectively (the next section).
Linear Regression Analysis
It is well known that the application of the paired t-test in comparing two
methods over a wide range of concentration is inappropriate because the validity of
this parametric test rests on the assumption that any errors, either random or
systematic, are independent of the concentration. The preferred methad in such
cases is linear regression (y = a + bx). Applying the ordinary linear regression
method (LS) to the data in Table 1 we obtained the statistical results presented in
Table 2.
Hypothesis tests for the regression parameters were carried out using the
familiar T-tests pr~cedure~~ '~ . For example, if we are intersted in testing H,: a = 0,
so that this could be regarded as testing the significance of the intercept, then a T-
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
1060 sARsu
Table 2. Regression analysis results concerning the comparison of the two
methods discussed in ref. 1.
Parameter Estimate Standard T Probability
Error Value Level
Intercept (a) 2.279 1.7529 1.300 0.24 126
Slope (b) 0.619 0.13 13 4.716 0.00327 ................................................................................ 95% Confidence limits for a:
95% Confidence limits for b:
Standard error of estimate :
Correlation Coefficient : r = 0.8874
R- squared :
a + ts, = 2.279 + 4.282
b + tsb = 0.619 * 0.3 18
s.,~ = 2.71715
R2 = 78.75%
statistic can be constructed using a, the sample estimate in the usual way, i.e. by T
= (a-O)/std.error. The significance levels (P-values) given in the last column of
Table 2, pertain to T-tests that the corresponding coefficients are zero. For a the P-
value is 0.24126 indicating that the intercept does not differ significantly from the
"ideal" value of 0. On the contrary the P-value for b of 0.00327 suggests a high significant value for slope. It is also possible to calculate the confidence intervals
for the parameters of regression which confirm that the intercept does not differ
significantly from 0 (the confidence interval of intercept includes 0) whereas the
value of slope (0.619) is significant different from the "ideal" value of 1 (the
confidence inerval of slope does not include 1).
An overall assessment of the quality of the linear regression is provided by
the coefficient of determination, R-squared. When multiplied by 100, it gives the
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
ANALYSIS OF VARIANCE 1 0 6 1
percentage of the variability observed in the results obtained with the tested method
which is explained by the linear regression on the more precise method.
All the statistical results presented in Table 2 (including the value of R-
squared namely 78.75) dustrate the presence of the proportional errors introduced
by the microwave digestion method. Much more, the correlation coefficient
between the mean values obtained using the traditional water-bath digestion method
and the differences , Di, (r = 0.9323) illustrates once more the presence of the
proportional errors.
Owing to the lack of reliability of the standard regression model's estimates
of the constant and proportional errors (bias) in some analytxal situations
alternative models have been proposed in recent years'"''.
To take into account the variance on both methods we have computed the
principal component regression (PCR) because the standard deviation of
measurements was shown to be the same'. The results obtained concerning the
confidence intervals of intercept (a = 1.836 * 4.293) and slope (b = 0.668 * 0.322),
respectively, confirm also the conclusion regardmg the presence of a proportional
bias. The same result was obtained using the Abbe statistical text?" and a new
approach discussed recently by Nilsson2' (the next sections).
Abbe Text
In the case of this text, used also when the standard deviation of the methods
compared is the same, one calculates
where Di and D are the differences between the means of the two methods and the
mean of them, respectively. By applying tius text at the 5% confidence level and 8
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
1062 sAmu
degrees of freedom one obtained the same results, i.e. the presence of a proportional
bias, because the calculated A value (0.2714) is lower than the tabulated value
(0.4912).
Mean square succesive difference approach
This algorithm discussed recently by Nilsson2' considers the differences zj
= lnyi - lnx,, i = 1,2, ..., n ordered a c c o r h g to the obtained mean concentrations of
the two methods, (9 + yJ2, or the concentration of the reference method, 3, if it has a better precision or is supposed to give conventionally true values. Pooling the
estimate from all the consecutive differences gives the variance estimate based on
the mean square succesive difference
which should be an estimate of the same variance as
l n s = -E (2 , - ; )2 (20)
n - 1 , = I
when the bias function is constant. If there is a gradual change in the bias function,
s2 will tend to be larger than sMSSD2 and we can use this method to test whether the
bias function is constant or not. One of the main conclusions of this approach is that
if sMSSD < 0.8s, the bias function is not constant. Applying this procedure in our
case we have obtained sMSSD < 0.8s because kssD2 = 0.0118 and s2 = 0.0216.
Even if we eliminate the highest difference the result obtained is the same, i.e. the
presence of the proportional bias introduced by the microwave digestion method.
CONCLUSIONS
A new approach for the analysis of method comparisons over a wide range
of concentrations was discussed and compared with parametric and non-parametric
tests and also with two linear regression methods namely LS and PCR.
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14
ANALYSIS OF VARIANCE 1063
Computing parametric and non-parametric tests. no differences were
observed between the traditional water-bath digestion and the alternative microwave
technique. On the contrary the regression methods LS, PCR and also informational
analysis of variance proved a significant difference. The same result was obtained
applying a new approach discussed recently in the analytical literature which
appears to be very similar with the Abbe test. Hence, it can be concluded that the
microwave technique introduces a proportional bias.
REFERENCES
1. 2. 3.
4. 5.
6.
7.
8. 9. 10. 1 1 . 12. 13.
14. 15. 16. 17. 18.
19. 20. 21.
R. Maw, L. Witry and T. Emond, Spectroscopy, 9,39( 1994). 0. Onicescu, ('. R.Acad.Sci.,Ser. A , 263, 84 1( 1966). 0. Onicescu and V. Stefanescu, Infbrmational Stalisircs, Editura Tehnica. Bucharest 1979. 0. Onicescu, Rev.Statist., 11, 4( 1966). V. $te finescu, Applications of Informational Energv and ('orrelalion. Editura Academiei, Bucharest 1979. D.L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte and L. Kaufman, Chemomezrics: a Textbook, Elsevier, Amsterdam( 1988). V. Ste fanescu, Applications of Informational Enera and ( 'orrelairon, Editura Academiei, Bucharest, 1979, Anexa 2. C. Stirbu and H. Nqcu, Rev.Chim. (Bi~harest). 41, 276( 1990). C. Stirbu and H . Nqcu, Rev. Roum. Chim., 37,945( 1992). D. Dumitrescy C. Skbu and H.Pop, Anal. I x t t . . 26. 123( 1994). C. Stirbu, Anal.Chim.Acta, 271, 269( 1993). I. VBduva, The Analysis of Variance, Editura TehnicS,Bucharest( 1970). J.C. Miller and J.N. Miller, Statislics,fiw Analyiical ('hemisty, Horwood. New York (1988). M. Thomson, Analyst, 107, 1 l69( 1982). H. Passing and W. Bablok, J. Clin. Chem. Clin. Biochem., 21, 709( 1983). H. Passing and W. Bablok, J. Clin. ('hem. Clin. Biochem.. 22,43 I ( 1984). H. Passing and W. Bablok, J. Clin. Chem. C'lin. Biochem., 26, 783( 1988). C. Hartmann, J. Smeyers-Verbeke and D.L. Massart, n Analusis, 21. 125 ( 1 993). A.H. Kalantar, B.R. Gelb and J.S. Alper, Yalanta, 42, 597( 1995). C. Stirbu, V. Liteanu and D. Pop, STUDIA(CHEMIA), XXXVII, 13( 1992) G. Nilsson, J. Chemom., 5 , 523(1991).
Received: January 17, 1996 Accepted: December 15, 1996
Dow
nloa
ded
by [
Uni
vers
ity o
f Sa
skat
chew
an L
ibra
ry]
at 1
6:39
19
Nov
embe
r 20
14