Hierarchical classification designs for the estimation of different sources of variability in proficiency testing experiments

Analytica Chimica Acta 555 (2006) 348–353

Hierarchical classification designs for the estimation of different sourcesof variability in proficiency testing experiments

Pedro Araujo∗, Livar FrøylandNational Institute of Nutrition and Seafood Research (NIFES), P.O. Box 2029 Nordnes, N-5817 Bergen, Norway

Received 25 April 2005; received in revised form 5 August 2005; accepted 9 September 2005Available online 18 October 2005

Abstract

An approach to the estimation of possible different sources of variation found in proficiency testing experiments is described. Four errors namely,technique, analyst, laboratory and geographical location are considered and calculated by using a rational experimental design based on hierarchicalclassification. The treatment of the confidence of the design over different experimental arrangements is explored and visualised by calculating afunction that depends only on the design and not on the experimental response. An illustrative example based on simulated data is used to showhow the theory could be applied in practice.©

K ;Q

1

trtitttqampactatrs

tablelabo-ludertic-

es ine notfoundriabil-signveralficanthow

a tool

nted.s aremake

inlogyn-ut its

0d

2005 Elsevier B.V. All rights reserved.

eywords: Proficiency testing (PT); Experimental design; Hierarchical classification; Sources of variability; Accreditation; Performance parameter; Quality assuranceuality control; Confidence of the experimental design

. Introduction

Proficiency testing (PT) is a quality assurance measure usedo monitor performance and competence of individual laborato-ies. It can also be used to enhance the measurement quality ando determine areas in which improvement may be needed. Anndependent coordinator distributes individual test portions of aypical uniform material and the participant laboratories analysehe material by their method of choice and return the results tohe coordinator[1,2]. If the purpose of the PT test is to improveuality it is imperative that the analysts in the laboratories choosestrategy that reduces experimental error, in that way they canodel their results with great confidence. Proficiency testing isrincipally concerned with applying a quantitative model whichssesses how good or bad a laboratory is at estimating the con-entration of a compound while the performance of the labora-ories is evaluated based on the differences between their resultsnd either the true value or the consensus value[3]. Traditionally,

he experimental schemes used in proficiency testing are quiteigid, the assigned value of the test material can be established ineveral ways but if it is calculated from analytical results as a con-

sensus value only if the uncertainty in this values is accepand this often means that a large number of participantratories (>20–30) are involved. This prerequisite can exccountries with a small number of laboratories ready to paipate[4] although there are many international PT schemoperation. Experimental design for proficiency testing can bonly interested in the discrepancies between assigned andsample values but also in assessing analyst or laboratory vaity against, for instance, spatial variability in the sample. Deof experiments for proficiency testing are influenced by sefactors and a good design attempts to determine how signithese factors are in influencing the quality of the results andlarge each source of error is, whereas in addition it provideswhich can assist the concentration of efforts efficiently[5]. Theliterature on experimental design is ample and well documeThe provision of methods described in books and journaladvocated by statistician and have certain properties thatthem powerful tools to study a wide diversity of problemsmany scientific and industrial areas, ranging from psychoto chemistry[6–8]. However, in spite of the multiple advatages that experimental design techniques can bring abo

∗ Corresponding author. Tel.: +47 55905115; fax: +47 55905299.E-mail address: [email protected] (P. Araujo).

application on proficiency testing, has not enjoyed widespreadpopularity, probably due to the scarce number of publicationson efficient designs applicable either to a large or to a limitednumber of participants. In this article a flexible experimental

003-2670/$ – see front matter © 2005 Elsevier B.V. All rights reserved.oi:10.1016/j.aca.2005.09.024

P. Araujo, L. Frøyland / Analytica Chimica Acta 555 (2006) 348–353 349

design based on hierarchical classification for proficiency test-ing is studied and analysed in terms of the different sources ofvariation involved in the experiments. Especial emphasis hasbeen given to the decomposition, calculation and explanation ofthe different potential sources of variation and estimation of theirsignificance. Artificial data and fictive analyst, laboratories andlocations are used to illustrate how the described theory couldbe applied in proficiency testing experiments.

2. Experimental design

An ideal experimental strategy to determine the concentrationof a compound in a test material and to compare interlaboratoryperformance as well as different potential sources of variabilityis illustrated inFig. 1. This particular design is based on the hy-pothesis that there are 5 distinct components of variability. Theseconsist ofI countries (I = A, B, C, . . ., Z) where the experimentsare performed. At each countryi, Ji laboratories are engaged inthe determination of a particular element (Ji = 3) which is in ahomogeneous powder sample that is analysed byKi analysts ineach laboratory (Ki = 3) usingLijk different techniques (Lijk = 3).A total number ofMijkl instrumental measures are performed(Mijkl = n) at each instrumental technique. According to thisnotation them measurement in the countryi, laboratoryj, bythe analystk who used the techniquel is recorded asxijklm.

n be

tory

Fig. 1. Hierarchical classification experimental design to study different sourcesof variability in proficiency testing experiments.

3.1. Technique

The variability due to the instrumental technique is definedby

S2wt =

∑Lijkl=1

∑Mijklm=1(xijklm − x̄ijkl)2∑Lijkl=1(Mijkl − 1)

(6)

S2bt =

∑Kijk=1

∑Lijkll=1 [Mijkl(x̄ijkl − ¯̄xijk)2]

Lijk − 1(7)

The following averages can be defined fromFig. 1:

• average at each technique level:

x̄ijkl =∑Mijkl

m=1xijklm

Mijkl

(1)

• average at each analyst level:

¯̄xijk =∑Lijk

l=1x̄ijkl

Lijk

(2)

• average at each laboratory level:

¯̄̄xij =∑Kij

k=1¯̄xijk

Kij

(3)

• average at each country:

¯̄̄̄xi =

∑Jij=1

¯̄̄xij

Ji

(4)

• average between countries:

¯̄̄̄x̄ =

∑Ii=1

¯̄̄̄xi

I(5)

3. Sources of variability

Several sources of variability based on this model cacalculated as mean square errors. According toFig. 1, foursources of variability, specifically technique, analyst, laboraand country are estimated.

350 P. Araujo, L. Frøyland / Analytica Chimica Acta 555 (2006) 348–353

whereS2wt andS2

bt are the mean square errors within and betweeninstrumental techniques.

3.2. Analyst

To compare the different analysts and establish whether thereported values might all belong to the same population regard-less of the analysts, the mean square errors within analysts (S2

wa)and between analysts (S2

ba) are calculated by using the followingequations:

S2wa =

∑Kijk=1

∑Lijkl=1

∑Mijklm=1(xijklm − ¯̄xijk)2(∑Lijk

l=1Mijkl

) − Kij

(8)

S2ba =

∑Jij=1

∑Kijk=1

∑Lijkl=1

(Mijkl

(¯̄xijk − ¯̄̄xij

)2)

Kij − 1(9)

3.3. Laboratories

Another factor that may be involved is the influence of the lab-oratories. The mathematical equations summarising the variancewithin (S2

wlab) and between (S2blab) laboratories may be expressed

by the following equations:

S2 =∑Kij

k=1∑Lijk

l=1∑Mijkl

m=1

(xijklm − ¯̄̄xij

)2

( ) (10)

S

N l’h fusiow

3

ries.T ithi( fol-l

S

S

Ttt -op erro(

E

The significance of the various sums of squares is analysed usingthe F-test and comparing them with the tabulated 95% confi-dence level.

4. General model and design matrix

The general model for the design given inFig. 1can be writtenas

x = µ + rc + rlab + ra + rt (15)

For this modelx is the experimental response measured at thecountry level,µ represents the true or the consensus value andther terms are the residual deviations between the observed andthe predicted response at each country (rc), laboratory (rlab),analyst (ra) and technique (rt) level. The overall sum of squaresof these residuals was defined in the previous section (Eq.(14)).

The various possible designs generated fromFig. 1by com-bining different parameter levels can be represented in matrixnotation. For instance, for two countries participating in a pro-ficiency testing trial, with two laboratories per country, 2 ana-lysts per laboratory and two instrumental techniques per analyst(for the sake of simplicity instrumental replication is not con-sidered) the total number of unique experiments will be 16(I × J × K × L) and the design matrix can be represented as fol-lows:

X

w t col-u fourtha mely:c tech-n ts ap

5

e ane used

wlab ∑Jij=1

∑Kijk=1

∑Lijkl=1Mijkl − Ji

2blab =

∑Jij=1

∑Kijk=1

∑Lijkl=1Mijkl

(¯̄̄xij − ¯̄̄̄

xi

)2

Ji − 1(11)

ote that in Eqs.(10) and (11)the subscript ‘lab’ instead of ‘as been used to designate laboratories and to avoid conith the term ‘l’ used to indicate technique.

.4. Countries

The final possible factor is the influence of the counthe mathematical equations associated with the variance wS2

wc) and between (S2bc) countries may be expressed by the

owing equations:

2wc =

∑Ii=1

∑Jij=1

∑Kijk=1

∑Lijkl=1

∑Mijklm=1

(xijklm − ¯̄̄̄

xi

)2

(∑Ii=1

∑Jij=1

∑Kijk=1

∑Lijkl=1Mijkl

)− 1

(12)

2bc =

∑Ii=1

∑Jij=1

∑Kijk=1

∑Lijkl=1Mijkl

(¯̄̄̄xi − ¯̄̄̄

x̄)2

I − 1(13)

he numerators of Eqs.(6), (7), (9), (11) and (13)which arehe sum of squares for within technique error (Ewt), betweenechnique error (Ebt), between analyst error (Eba), between labratory error (Eblab) and between countries error (Ebc), have theroperty that their summation represents the overall designEo):

o = Ewt + Ebt + Eba + Eblab + Ebc (14)

n

n

r

=

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣

1 1 1 1 1

1 1 1 1 −1

1 1 1 −1 1

1 1 1 −1 −1

1 1 −1 1 1

1 1 −1 1 −1

1 1 −1 −1 1

1 1 −1 −1 −1

1 −1 1 1 1

1 −1 1 1 −1

1 −1 1 −1 1

1 −1 1 −1 −1

1 −1 −1 1 1

1 −1 −1 1 −1

1 −1 −1 −1 1

1 −1 −1 −1 −1

∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣∣hereX represents the design matrix of parameters, the firsmn represents the average response, the second, third,nd fifth columns represent the studied parameters, naountries, laboratories/country, analysts/laboratory andiques/analyst. Every row of the design matrix represenarticular experimental condition.

. Confidence of the experimental design

The experimental errors which have been used to providstimate of the influence of the parameters, might also be


to estimate the confidence of the experimental design by meansof the Working and Hotelling confidence limits[9] expressed as

x± = So

√m × Fm,n−m × xn(X′X)−1x′

n (16)

whereSo, F, m and n are the root mean square overall error,the Fisher variance ratio, the number of parameters consideredand the number of measurements respectively. The termxn rep-resents then-row of the design matrixX. The term (X′X)−1

is the dispersion matrix and the termxn(X′X)−1x′n is a scalar

called leverage (h) that measures the potential influence of anobservation on the parameter estimated and consequently theconfidence in the experimental design. The lower the leverage,the higher the confidence in the experimental design as is shownin Eq. (16). The termh depends only on the design and not onthe experimental response and allows us, without performingany measurement, to visualise how confidently a design pre-dicts data in an experimental region. It is possible to derive anequation for leverage and display graphically the confidence ofthe experimental design in the following manner:

• Calculate the matrix (X′X)−1. The dispersion matrix for theaforementioned model 2× 2× 2× 2 is

∣∣∣∣∣0.063 0 0 0 0

0 0.0630 0 0 0

∣∣∣∣∣

•rersws

• ethth

d

edmsonartatient

Fig. 2. Representation of the leverage as a function of the parameters analystand technique for a design 2× 2× 2× 2.

domain can be computed using Eq.(17). A graphical repre-sentation of the leverage as a function of the variablesxa andxt is showed inFig. 2. The levels ofxc and xlab in Fig. 2were kept at +1. Note that through these representations it ispossible to study the confidence of the design outside of theexperimental region.

The reader interested in experimental design confidence stud-ies is referred to a comprehensive article on the subject publishedelsewhere[10].

6. Application

The main purpose of the present application is to show howa particular array of experiments may be treated using the prin-ciples of experimental design, and the results can be interpretedin the same way as for conventional designs. The applicabil-ity of the equations described above is demonstrated by usingsome simulated data presented inFig. 3, for instance, the deter-mination of prostaglandin E2 (PGE2), by different techniquessuch as high performance liquid chromatography (HPLC), massspectrometry (MS) and enzyme-linked immunosorbent assay(ELISA). It can be seen thatFig. 3 has been generated fromFig. 1by introducing an unequal number of laboratory, analystand technique levels per location. The total number of experi-

ro-ns

rk-

SA

n

(X′X)−1 = ∣∣∣∣∣∣∣

0 0 0.063 0 0

0 0 0 0.063 0

0 0 0 0 0.063

∣∣∣∣∣∣∣

Label the rows and columns of the matrix (X′X)−1

according to the terms of the vectorx. The vecto( 1 xc xlab xa xt ), representing the five parametstudied in Eq.(15), is used to label the columns and roof the dispersion matrix as follows:

1 xc xlab xa xt

1

xc

xlab

xa

xt

∣∣∣∣∣∣∣∣∣∣∣∣

0.063 0 0 0 0

0 0.063 0 0 0

0 0 0.063 0 0

0 0 0 0.063 0

0 0 0 0 0.063

∣∣∣∣∣∣∣∣∣∣∣∣

Obtain the leverage equation coefficients by adding togthe relevant values. The equation for leverage of2× 2× 2× 2 model generated fromFig. 1using the labelledispersion matrix described above is

h = 0.063× (1 + x2c + x2

lab + x2a + x2

t ) (17)

The diagonal of the matrix (X′X)−1 represents the squarterm coefficients of theh equation. Linear and crossed tercoefficients are obtained by adding together the off-diagvalues of the formxAB + xBA. In the present example lineand crossed terms were all zero. The graphical represenof the leverage values as a function of the whole experim

ere

al

onal

ments according to the unbalanced design showed inFig. 3 is45 (2× 1.5× 2× 2.5× 3) and can be described as follows: pficiency testing experiments were carried out in two locatioAandB (I = 2) where one and two laboratories respectively (J1 = 1andJ2 = 2) were involved in the determination of PGE2 in stan-dards. Three analysts were working at locationA, laboratoryα(K11 = 3) and at locationB, two and one analysts were woing at laboratoriesβ andγ, respectively (K21 = 2 andK22 = 1)usingLijk different techniques, namely, HPLC, MS and ELI(L111= L112= L113= 3, L211= L212= 2 andL221= 2). Triplicatemeasurements were performed (Mijkl = 3) and the concentratioof PGE2 in pg/�l represented asxijklm.

352 P. Araujo, L. Frøyland / Analytica Chimica Acta 555 (2006) 348–353

Fig. 3. Experimental design used to illustrate the calculation of the differentsources of variability.

7. Discussion

The confidence of the experimental design showed inFig. 3was estimated by calculating the equation for leverage (h) asexplained above. By comparison withFig. 2, it can be seen inFig. 4that the shape of the confidence is strongly influenced byintroducing unequal number of parameters at the different levelstudied and that the highest confidence is not longer in the centrof the design. In spite of the latter fact,Fig. 4 shows uniformconfidence regions with leverage values lower than those fromthe balanced design shown inFig. 2. Similar trends to thoseobserved inFig. 4 were obtained when the other parametersstudied were plotted against each other.

F analya

Table 1shows the main error terms and their significancefor the results presented inFig. 3. In general using this simu-lated data we can see that the variability due to different analystsis higher compared to the variability due to the different tech-niques. There has been some controversy, regarding the use ofdifferent techniques for proficiency testing. The significant dif-ference at 95% confidence level found between HPLC and MSat locationB, laboratoryγ would in this simulated case sup-port the belief that the use of the same technique leads to betteragreement between results in proficiency testing than in the sit-uation where different techniques are employed[11]. Anotherpossible explanation for the observed variability between HPLCand MS could be the few degrees of freedom to judge the perfor-mance of these techniques in laboratoryγ. It would be advisableto increase the number of measurements, introduce additionaltechniques in this particular laboratory or investigate how theanalyst has implemented the methods.

The simulated data do not show any statistical differencewhen similar instrumental techniques are used. In spite of thisresult, it cannot be precluded the possibility that in practice asignificant difference might occur as a result of the matrix ofthe material used. A recent metrological study has shown theprominent role of the sample matrix when routine techniquesare used[12].

Table 1shows that in terms of analysts, laboratories and loca-tions there were not any differences among the means of thes ques,s omes , tot lev-e itsi ples[

i um-b tici-p theh dif-f rtiesi entt dt -r otheri vari-a sesst outt

een am larityb stingr theo

errord f oth-e . Forir

ig. 4. Representation of the leverage as a function of the parametersnd technique for the design showed inFig. 3.

se

st

amples that were subject to different instrumental techniince theF critical values were not exceeded. In practice, significant variability could be expected due, for instancehe trend of the immunological assay to overestimate thels of prostaglandin E2 as a result of cross-reactivity or

nherent variability in the quantification of sequential sam13].

The hierarchical classification design presented inFig. 1s quite versatile. It can be implemented on a large ner of laboratories or can be modified to allow the paration of a limited number of laboratories. In addition,ierarchy of the factors under study can be changed if a

erent visualisation is required or preferred for the panvolved in the interpretation of the results. The arrangemechnique→ analyst→ laboratory→ location can be changeo analyst→ technique→ laboratory→ location. By incorpoating additional levels more arrangements are possible. Anmportant feature is the entire visualisation of the sources oftion and their interrelation, for instance, it is possible to as

he analyst variability against instrumental variability withhe need of detailed model building.

The degrees of freedom involved at each level have not batter of concern in the present application for the sake of c

ut the use of experimental design as a tool for proficiency teequires an appropriate selection of them in order to bringptimum advantages that experimental design can offer.

The described hierarchical classification design and theecomposition approaches can be applied in the study ors parameters different to those considered in this work

nstance, the parameter instrumental technique (Lijk) can beeplaced for the variable digestion method.


Table 1Main error terms and significance

Error Fcalculated

Technique Analyst Laboratory Location Technique Analyst Laboratory Location

Within Between Within Between Within Between Within Between

A 0.600 0.023 1.543 0.137 0.115 1.0690.376 0.029 0.2340.224 0.290 45.179 0.024 3.890 0.023

B 0.150 0.049 0.514 0.062 0.919 0.198 1.297 1.197 3.4560.315 0.000 0.0030.097 0.246 10.138

TabulatedF-test and associated degrees of freedom are given as follows—A: F2/6 = 5.143 (technique),F2/24= 3.403 (analyst) at 95% confidence;B: F1/4 = 7.709(technique),F1/10= 4.965 (analyst),F1/16= 4.494 (laboratory) at 95% confidence,F1/43= 4.085 (location) at 95% confidence.

8. Concluding remarks

The experimental design based on hierarchical classificationwith unequal number of laboratory, analyst and technique levelsper location has permitted the comparison of different sourcesof variation involved in proficiency testing experiments usingsimulated data.

The information on the relative sizes of the between andwithin variances for instrumental techniques, analysts, labora-tory and geographical location can be used to design furthertrials more efficiently by minimizing the effects of the greatervariance sources.The application of hierarchical classificationdesigns in proficiency testing can improve the comparability ofmeasurement data by providing a simultaneous comparison ofthe various sources of errors involved in the experiments. Thisis important, for it enables the laboratory manager to measurethe efficacy of the overall quality system instead of assessing thelevels under study separately.

The approach used in this article to demonstrated how con-fidence changes as the design is altered has proved to be avaluable strategy to select the best experimental arrangementsfor proficiency testing experiments without actually performingan experiment.

References

[1] E.A. Maier, P. Quevauviller, B. Griepink, Anal. Chim. Acta 283 (1993)590–599.

[2] I. Taverniers, M. de Loose, E. Van Bockstaele, Trends Anal. Chem. 23(2004) 535–552.

[3] ISO/IEC Guide 43-1, Proficiency Testing by Interlaboratory Compar-isons, ISO, Geneva, 1997, Part 1.

[4] Kuselman, M. Pavlichenko, Accred. Qual. Assur. 9 (2004) 387–390.

[5] R.G. Brereton, Chemometrics, Applications of Mathematics and Statis-tics to Laboratory Systems, Ellis Horwood, Chichester, 1990, ISBN0131313509.

[6] D.L. Masart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte, L. Kauf-man, Chemometrics: A Textbook, Elsevier, Amsterdam, 1988, ISBN0444426604.

[7] P.W. Araujo, R.G. Brereton, Trends Anal. Chem. 15 (1996) 63–70.

[8] D.J. Finney, Experimental Design and its Statistical Basis, CambridgeUniversity Presss, London, 1955, ISBN 0226250016.

[9] H. Working, H. Hotelling, J. Am. Stat. Assoc. 24 (1929) 73–85.[10] P.W. Araujo, R.G. Brereton, Analyst 122 (1997) 621–630.[11] A.M.H. Van Der Veen, T.L. Hafkenscheid, Accred. Qual. Assur. 9 (2004)

657–661.[12] C. Philippe, A. Marschal, Trends Anal. Chem. 23 (2004) 178–184.[13] M. Alon, K.L. Duffin, P.C. Isakson, Anal. Biochem. 235 (1996) 73–

81.

Documents

Hierarchical classification designs for the estimation of different sources of variability in proficiency testing experiments