11
263 Looking at Analytical Data Brian LISTER British Geological Survey, Geochemistry Directorate, London WClX BNG, United Kingdom. Using a statistical package assembled prima- rily for the evaluation of ore standards produced by BGS, the opportunity has been taken to re-examine not only data on earlier BGS reference materials but also data published by other organizations. It is concluded that, often due to the difficulties of processing large numbers of results, data are not always looked at as closely as they should be. This can cause incorrect evaluations to be made. Other problems encoun- tered in the assessment of inter-laboratory data are discussed. It is suggested that one of the most effective ways of examining replicate data is by plotting them sequentially as S-distri- bution curves. The writer has recently completed the evaluation of four base metal concentrates as reference materials. This task was made easier by using a data processing program assembled for the NERC Honeywell 66/DPS-300 computer. This orders the data and then calculates various statistical parameters including some robust estimates not usually found in statistical packages. Data may be eliminated using a choice of criteria and pararceters recalculated. At any stage, the data may be plotted sequentially as S-distribution curves. It is undoubtedly a great advance from our first survey of ore analysis in 1968 (l), where an electro-mechanical calculator was used. The statistical package enabled the 5000 data from 27 laboratories to be examined and re-examined in detail. The plots of the data, in particular, were invaluable because any anomalous distribution shapes were visible immediately. Outliers could thus be identified and the validity of their rejection confirmed mathema- tically. To adapt a well-known saying, "a picture is worth a thousand numbers" and it is the great value of the distribution curves that has occasioned the title of this paper. The comparative ease with which data could be examined by t h i s method led t o a retrospective look at data sets from previous work initiated by this laboratory and then to look at data published by other workers in the field. The conclusion is that we do not always make the best use of our data, sometimes because inappropriate methods are used, sometimes due to the diffi- culties of dealing with large numbers of data. The latter often results in the processing of data mechanically without looking at them. A computer enables the tedium to be removed from data processing but it then becomes even more important that the data should be examined carefully. Two comments made in previous papers bear repetition: In interpreting data.. . , a rigidly mathenatical approach on the one hand or an empirical approach on the other.. . , will not yield the best results. Methods need to vary depending on the number and quality of the data" (2). "Finally, in the assessment of analytical data, there is one technique that should never be overlooked - an open mind, a fresh look at the data and commonsense" ( 3) . It must be emphasized that the purpose of this paper is not to criticise om own previous work or that of others but to point some of the ways to a more searching appraisal of data. Methods are continually evolving and, although those that were used 20 years ago may still be satisfactory when applied to good data, they may prove very fallible when used with data of lesser quality. There is no definitive way of evaluating analytical data but the appearance of the data, when plotted, may well suggest the best approach. That is the main theme of this paper. DATA EVALUATION We can now look at some data and see how their evaluation might be improved starting with some generated by this Institute and continuing with data from other laboratories. Note that in the figures below, the abscissa represents concentration of element and the ordinate, number of data. The three vertical lines show the mean and one standard deviation Geostandards Newsletter, VOZ. 9, No 2, Octobre 1985, p. 263 Ci 273

Looking at Analytical Data

Embed Size (px)

Citation preview

Page 1: Looking at Analytical Data

263

Looking at Analytical Data

Brian LISTER

B r i t i s h Geological Survey, Geochemistry D i r e c t o r a t e , London WClX BNG, United Kingdom.

Using a s t a t i s t i c a l package assembled prima- r i l y f o r t he evaluation of ore standards produced by BGS, the opportunity has been taken t o re-examine not only da ta on e a r l i e r BGS reference materials b u t a l so da ta published by other organizations. I t i s concluded t h a t , often due t o the d i f f i c u l t i e s of processing la rge numbers of r e s u l t s , da ta a re not always looked a t as c lose ly as they should be. T h i s can cause incor rec t evaluations t o be made. Other problems encoun- tered in the assessment of in te r - labora tory data a re discussed. I t i s suggested t h a t one of t he most e f f ec t ive ways of examining r ep l i ca t e data i s by p l o t t i n g them sequentially as S - d i s t r i - bution curves.

The writer h a s r e c e n t l y completed t h e e v a l u a t i o n of four base metal c o n c e n t r a t e s as r e f e r e n c e materials. This t a s k was made easier by us ing a d a t a process ing program assembled f o r t h e NERC Honeywell 66/DPS-300 computer. T h i s o r d e r s t h e d a t a and then c a l c u l a t e s v a r i o u s s t a t i s t i c a l parameters i n c l u d i n g some r o b u s t estimates n o t u s u a l l y found i n s ta t is t ical packages. Data may be e l imina ted us ing a c h o i c e o f c r i te r ia and pararceters r e c a l c u l a t e d . A t any s t a g e , t h e d a t a may be p l o t t e d s e q u e n t i a l l y as S - d i s t r i b u t i o n curves. It is undoubtedly a great advance from our first survey o f o r e a n a l y s i s i n 1968 (l), where an electro-mechanical c a l c u l a t o r was used.

The s ta t is t ical package enabled t h e 5000 d a t a from 27 l a b o r a t o r i e s t o be examined and re-examined i n d e t a i l . The p l o t s of t h e data , i n p a r t i c u l a r , were i n v a l u a b l e because any anomalous d i s t r i b u t i o n shapes were v i s i b l e immediately. O u t l i e r s could t h u s be i d e n t i f i e d and t h e v a l i d i t y of t h e i r r e j e c t i o n confirmed mathema- t i c a l l y . To adapt a well-known saying , "a p i c t u r e i s worth a thousand numbers" and it is t h e g r e a t v a l u e o f t h e d i s t r i b u t i o n curves t h a t h a s occasioned t h e t i t l e of t h i s paper .

The comparative ease wi th which d a t a could be examined by t h i s method l e d t o a r e t r o s p e c t i v e look a t d a t a sets from previous work i n i t i a t e d by

t h i s l a b o r a t o r y and t h e n t o l o o k a t d a t a publ i shed by o t h e r workers i n t h e f i e l d . The conclus ion is t h a t w e do n o t always make t h e b e s t use o f our d a t a , sometimes because i n a p p r o p r i a t e methods are used , sometimes due t o t h e d i f f i - c u l t i e s of d e a l i n g w i t h large numbers of d a t a . The l a t t e r o f t e n r e s u l t s i n t h e process ing of d a t a mechanical ly wi thout looking a t them. A computer e n a b l e s t h e tedium t o be removed from d a t a process ing b u t i t then becomes even more impor tan t t h a t t h e d a t a should be examined c a r e f u l l y . Two comments made i n prev ious papers bear r e p e t i t i o n : I n i n t e r p r e t i n g da ta . . . , a r i g i d l y mathenat ica l approach on t h e one hand or a n e m p i r i c a l approach on t h e o t h e r . . . , w i l l n o t y i e l d t h e b e s t r e s u l t s . Methods need t o vary depending on t h e number and q u a l i t y of t h e da ta" ( 2 ) . " F i n a l l y , i n t h e assessment of a n a l y t i c a l d a t a , t h e r e i s one technique t h a t should never be overlooked - an open mind, a f r e s h l ook a t t h e d a t a and commonsense" ( 3) .

I t must be emphasized t h a t t h e purpose of t h i s paper is n o t t o c r i t i c i s e o m own previous work or t h a t of o t h e r s bu t t o p o i n t some of t h e ways t o a more s e a r c h i n g a p p r a i s a l o f d a t a . Methods are c o n t i n u a l l y evolv ing and, a l though t h o s e t h a t were used 20 y e a r s ago may still be s a t i s f a c t o r y when a p p l i e d t o good d a t a , they may prove very f a l l i b l e when used w i t h d a t a of lesser q u a l i t y . There is no d e f i n i t i v e way of e v a l u a t i n g a n a l y t i c a l data b u t t h e appearance of t h e data, when p l o t t e d , may w e l l s u g g e s t t h e b e s t approach. That is t h e main theme of t h i s paper.

DATA EVALUATION

We can now look a t some d a t a and see how t h e i r e v a l u a t i o n might be improved s t a r t i n g with some g e n e r a t e d by t h i s I n s t i t u t e and cont inuing wi th d a t a from o t h e r l a b o r a t o r i e s .

Note t h a t i n t h e f i g u r e s below, t h e abscissa r e p r e s e n t s c o n c e n t r a t i o n of e lement and t h e o r d i n a t e , number of d a t a . The t h r e e v e r t i c a l l i n e s show t h e mean and one s t a n d a r d d e v i a t i o n

Geostandards Newsletter, VOZ. 9, N o 2 , Octobre 1985, p . 263 Ci 273

Page 2: Looking at Analytical Data

264

e i t h e r s i d e o f i t . D i f f e r e n t symbols have been used f o r some methods of a n a l y s i s , b u t as t h e s e are n o t always c o n s i s t e n t , no key is g iven h e r e . I n t h e t a b l e s , n , K and s are t h e number of r e s u l t s , mean and s t a n d a r d d e v i a t i o n ; 6 , and b, are t h e skewness and k u r t o s i s which, i n a normal d i s t r i b u t i o n , should be 0 and 3, r e s p e c t i v e l y ; M i s t h e median and GM, t h e Gas twir th median ( 4 1 , 1 2 A is Hampel's M-estimate (51 , 25% is t h e 25% trimmed mean ( 5 ) and DCM is t h e dominant c l u s t e r mode ( 6 ) .

Fluorine in IGS 39

I n t h e o r i g i n a l d a t a process ing ( 7 ) . as t h e skewness and k u r t o s i s were w e l l w i t h i n t h e l i m i t s f o r a normal d i s t r i b u t i o n , no d a t a were elimi- na ted . However, as Figure 1 shows c l e a r l y , ' t h e r e are o u t l i e r s v i s i b l e a t bo th t o p and bottom, f o u r t e e n i n a l l . If t h e s e are e l i m i n a t e d , t h e d i s t r i b u t i o n i s reasonably symmetr ical i f some- what p l a t y k u r t i c . Robust estimates, us ing t h e o r i g i n a l 52 d a t a , s u g g e s t a h i g h e r va lue than t h e accepted mean of 46.69%. After e l i m i n a t i o n (see Table 1) , t h i s becomes even m x e c l e a r - c u t and a va lue o f 46.35% would now be sugges ted .

Table 1. IGS 39, F%

n

- X

s

LiG

b2

M

Gb!

12A

25%

DCM

5 2

46 .69

0 . 6 4

0 . 1 9

3 . 2 8

46 .83

46.77

46.77

46 .77

46 .69

38

46 .R2

0 . 2 5

-0 .34

2 . 3 4

46 .86

46 .86

4 6 . 8 5

46 .85

4 6 . 8 4

50 I ++- 4 0

30

I:

. . , , L . . , , .

4.5 5.5 4.6 46.5 47 41.5 40 40.5 I

Figure 1. IGS 39, Fluorine %

Tantalum in IGS 33

There were o r i g i n a l l y 53 r e s u l t s by XRF and gravimet ry ( 7 ) , ( F i g u r e 2 ) . The d i s t r i b u t i o n became approximately normal by removing t h e t h r e e lowes t d a t a b u t it s t i l l remained n e g a t i v e l y skewed. The m e a n i n c r e a s e d from 4.19 t o 4.29%, t h e median remaining t h e same a t 4.37% (see Table 2). T h i s was t h e one o c c a s i o n i n t h e e v a l u a t i o n of the 20 r e f e r e n c e materials IGS 20-39 where t h e mean and median d i f f e r e d so much t h a t t h e median w a s p r e f e r r e d as t h e b e s t estimate of a t r u e va lue . However, it now appears t h a t t h e DCM a t 4.40% would have been an even b e t t e r choice . If t h e most obvious d i s c r e p a n t r e s u l t s are removed ( 5 a t t h e t o p and t h i r t e e n a€ t h e bot tom) , t h e d i s t r i b u t i o n is very n e a r l y normal and the median becomes 4.39, t h e DCM remaining a t 4.40%. The r o b u s t n e s s o f Hampel's estimate i s w e l l i l l u s - t r a t e d s i n c e , i n r e d u c i n g the d a t a from 53 t o 35, it o n l y changes 0.02. The u s e o f some form of mode for skewed d a t a is i l l u s t r a t e d by t h e DCM which g i v e s a good estimate of a t r u e va lue wi thout removing any d a t a . The almost v e r t i c a l p o r t i o n of t h e d i s t r i b u t i o n c u r v e s u g g e s t s t h a t t h e t r u e v a l u e could be s l i g h t l y h i g h e r t h a n 4 . 4 m .

2.5 3 3.5 4 4 5 5

Figure 2. IGS 33, Tantalum %

Table 2. IGS 33, Ta% n 53 5 0 35

E 4 . 1 9 4 . 2 9 4.37

8 0 . 5 8 0 . 4 2 0 . 1 1

6 -1 .22 - 0 - 3 3 0 . 1 1

b2 5 .05 4 . 0 2 2 .95

M 4 .37 4.37 4 . 3 9

GM 4 . 3 4 4 . 3 4 4 . 3 8

12A 4.37 4.37 4 . 3 9

25% 4 . 3 3 4 . 3 5 4 . 3 8

DQ4 4 . 4 0 4 . 4 0 4 . 4 0

Page 3: Looking at Analytical Data

Barium in 1GS 38

‘“here were o r i g i n a l l y 48 d a t a with a mean of 52.00 and a median o f 51.35%. Four high o u t l i e r s are obvious ( F i g u r e 3 ) and were e l imina ted a t t h e time. However, it would be p r e f e r a b l e t o remove t h e four low d a t a , a l s o . Agreement between d i f fe i -en t e s t i m a t o r s i s s t i l l n o t e n t i r e l y s a t i s f a c t o r y but 51.54% is probably t h e b e s t choice (see Table 3 ) .

,-A-

50 52 54 56 58

Figure 3. IGS 38, Barium %

Table 3. IGS 38, Ba%

n 48 44 40 - X 52.00 51.46 51.65 .2 2.04 0.96 0.78

1.96 -0 I 20 0.36 6.67 2.55 1.96

Pi 51 035 51.26 51.35

P l

t 2

G 1.1 51.57 51.44 51 a 5 2

1 26 51.48 51.47 51 055 25$ 51.58 51 a 4 4 51.55 X:? 51.20 51.20 51.20

Copper in MP-la

These d a t a ( 8 j , show very w e l l t h a t , wi th a good s e r i e s of a n a l y s e s , e v a l u a t i o n poses f e w problems. Because o u t l i e r s occur more o r less symmetr ical ly (see Figure 4 ) , t h e d i f f e r e n t e s t i m a t o r s of a t r u e value are i n good agreement. The high skewness and k u r t o s i s , however, show t h a t t h e d i s t r i b u t i o n i s not normal (Table 4 ) . Four low and e i g h t h igh out l ier : ; can be iden- t i f i e d and e l imjna ted from t h e o r i , g i n a l 135 d a t a . Es t imates change b u t l i t t l e and agree with t h a t made tly t h e o r i g i n a t o r s of t h e m a t e r i a l .

I I . . I , , .

, . . . I , , . . _- 1.45 1.5 1.65 1.6 1 65 I .35 1.4

Figure 4. MP-la, Copper %

Table 4. MP-la, Cu%

135 1.35 - 1.65 1.41 0.047 2.57

13.30 1.44 1.44 1.44 1.44 1.44

122

1.37 - 1.49 1.14 0.01 9 0.17 2.74 1.44 1.44 1.44 1.44 1.44

That d i f f e r e n t methods of copper a n a l y s i s can y i e l d d i f f e r e n t r e s u l t s f o r t h e same sample h a s been noted more t h a n once ( 2 , 9 ) . Here, t h e bulk of t h e d a t a are by AAS and g ive a n e s t i m a t e of 1.44%, a g r e e i n g wi th t h e o v e r a l l va lue . The t i t r imet r ic d a t a are s l i g h t l y lower a t about 1.43. There are only t e n r e s u l t s by electro- gravimet ry from two l a b o r a t o r i e s bu t t h e lowest of t h e s e is 1.44%, so t h a t t h i s method g i v e s a high va lue . A s t h e d a t a by t i t r i m e t r y and e l e c t r o l y s i s are comparat ively f e w and t h e d i f f e r e n c e s s l i g h t , n o t too much s i g n i f i c a n c e should be a t t a c h e d t o t h e s e f i n d i n g s , but t h e t r e n d is t h e r e .

Tungsten in m-la

There are 60 r e s u l t s f o r tungs ten i n t h e base metal ore, MP-la ranging from 0.029 t o 0.0!53%. The o r i g i n a t o r s of t h i s material have g iven only a p r o v i s i o n a l va lue f o r t h i s e lement because of t h e lack of consensus. F igure 5 shows

Page 4: Looking at Analytical Data

266

t h i s very c l e a r l y . The cause .could be inhomo- genei ty for t h i s element or poor q u a l i t y ana lys i s but even a provis iona l value i s op t imis t i c .

30

20

;! 0

0 . 0 3 0 .035 I

I , , . . , I

, * . . . . I

0.045

F igure 5. MP-la, Tungsten X

0 .05

Total iron as Fe203 in FeE-1

This i r o n formation sample has been evalu- a t ed a t 75.06% by i ts o r i g i n a t o r s (10) , us ing t h e " s e l e c t l abo ra to r i e s " method. o f Abbey (11). This a s ses ses t h e number of good, f a i r and poor r e s u l t s for each p a r t i c i p a t i n g l abora to ry and, from these , c a l c u l a t e s a r a t i n g f o r each labora- t o ry . Only r e s u l t s from l a b o r a t o r i e s whose r a t i n g s a r e above a s p e c i f i e d l e v e l a r e used i n t h e evaluation. This method appears t o work very w e l l i n p r a c t i c e and has t h e g r e a t merit t h a t t he da ta are s tud ied i n d e t a i l r a t h e r than b l ind ly submitted t o some r i g i d mathematical process. There a r e two main c r i t i c i s m s , first, because of t h e na tu re of t h e r e j e c t i o n procedures, it is d i f f i c u l t t o cons t ruc t confidence i n t e r v a l s . Secondly, if a l abora to ry has re turned a mixture of good and bad da ta , depending on t h e element, method or a n a l y s t , i t seems d r a s t i c t o r e j e c t a l l t h e da ta . It is o f t e n found t h a t very capable l a b o r a t o r i e s f a i l occas iona l ly when they de ter - mine an unfami l ia r element or use an unusual technique.

Figure 6 shows one very d e f i n i t e o u t l i e r i n t h e 43 r e s u l t s , a n add i t iona l t h r e e low and s i x high o u t l i e r s and poss ib ly a f u r t h e r two high ones. The e f f e c t s of r e j e c t i n g t h e s e are shown i n Table 5. They a l s o show t h a t assessment is not always simple even wi th comparatively high q u a l i t y da ta . The writer would remove t h e four low and e i g h t high da ta and sugges t an e s t ima te of 75.74%. Note aga in t h e consistency of Hampel's 12A estimate compared wi th a l l bu t t he DCM.

Total iron as F e 2 4 i n I F 4 There is a t o t a l of 79 r e s u l t s for which t h e

compiler (12), has given a recommended value Of 55.85%. This should probably be higher-even with

t h e o r i g i n a l da t a , t he medien i s 55.94 and 1 2 A , 55.96%. Figure 7 shows t h a t t h e r e are three very c l e a r low o u t l i e r s and one high, but t he e l imina t ion o f a f u r t h e r s i x low r e s u l t s is j u s t i f i e d , g iv ing an almost pe r fec t normal d i s t r i b u t i o n . An estimate of 56 .OO% is suggested (see Table 6 ) .

35

1 5 m j

15

75

F igure 6. FeR-1, Total iron as Fe,03 %

Table 5. FeR-1, t o t a l Fe,O,%

n - X

k1 b2 M GM 1 2A 25$ DCM

43 75.88 3.09 -2 e 43 14.56 75.80 75.77 75.73 75.83 75.74

42 76 24 2.03 0.80

3.45 75 83 75 82 75.73 75.85 75.74

33 75-82 0.87 0.75 4.49 75.74 75.74 75.74 75.76 75.74

31

75 67 0.65 -0.39 3.40 75 72 75.71 75.74 75.73 75.74

5 53 54 55 5 7 5 1 56

Figure 7. IF-G, Total i r o n as Fe203 %

Page 5: Looking at Analytical Data

267

15 ---

lo ---

Tahle 6. IF-G, t o t a l Fe203%

'5

A

A

h

A

A

, , .

! , __h__ 20 + , '

-

- -

-

-

. . . , , I

n - X

S

PI b2 1.1

GM 1 2A

25;: DCM

79 55.70 0.99

-1.22

5.89 55.94 55.93 55.96 55.92 55.94

75 55 .v 0.77

-0.61

3-19 55.95 55.95 55.96 55.94 55.94

69 56 .02

0.57 -0.05

2.87 55.98 55.99 56.01 56 .OO

55.94

Analy t ica l methods may be a s s e s s e d sepa- r a t e l y . The 20 XRF d a t a (F igure 8 ) , have one c l e a r o u t l i e r . The 20 AAS r e s u l t s ( F i g u r e 91, have one obvious l o w o u t l i e r or, a l t e r n a t i v e l y , four low and one high. The 16 t i t r imetr ic d a t a have c e r t a i n l y one and probably two low o u t l i e r s (F igure 10 ) . A s o t h e r methods have been used by only a few l a b o r a t o r i e s , they have n o t been examined but t h e d a t a for t h e t h r e e main methods of a n a l y s i s are summarized i n T a b l e 7 . XRF, apparent ly g i v e s somewhat h i g h e r va lues and AAS, somewhat lower but t h e r e seems l i t t l e doubt t h a t 56%, or perhaps a l i t t l e h i g h e r , is a r e a s o n a b l e estimate.

Manganese in SOIL-5

Figure 11 shows 55 d a t a f o r manganese which have been given a p r o v i s i o n a l value of 052 ppm a f t e r e l i m i n a t i n g one low and four high o u t l i e r s (13 ) . The writer h a s always advocated p l o t t i n g d a t a as d i s t r i b u t i o n curves i n p r e f e r e n c e t o his tograms or any o t h e r method of v i s u a l repre- s e n t a t i o n because r e s u l t s a r e n o t processed i n any way, merely ordered . I n t h e c o n s t r u c t i o n of histo,grams, f o r example, a r b i t r a r y i n t e r v a l s must be chosen and shapes can v a r y cons iderably by a l t e r i n g those i n t e r v a l s . However, F igure 11 shows t h a t even d i s t r i b u t i o n curves have t h e i r dange:rs. There i s one gross o u t l i e r s o t h a t t h e v a r i a t i o n between t h e remaining d a t a is com- pressed , making them appear much b e t t e r t h a n they a c t u a l l y are. F i g u r e 1 2 e l i m i n a t e s t h e s i n g l e high r e s u l t and, immediately, v a r i a t i o n s between t h e remaining 54 d a t a become much c l e a r e r . F igure 13 shows t h e 50 d a t a used by t h e o r i g i n a l compiler . Since t h e skewness is o u t s i d e t h e 5% l i m i t f o r 50 d a t a and, s i n c e t h e lowest r e s u l t seems an obvious o u t l i e r , it h a s been removed. This would g i v e a n estimate of a t r u e va lue of t h e o r d e r of 880 ppm (Table 8 ) . Looking a t F igure 13 a g a i n , t h e r e are s t r o n g arguments for t h i n k i n g t h a t a l l t h e d a t a below 850 ppm are s u s p e c t and t h a t t h e cont inuous d i s t r i b u t i o n s t a r t i n g about 890 ppm r e p r e s e n t s t h e most r e l i a b l e r e s u l t s . The DCM, a t 922 ppm, s u p p o r t s t h i s view.

52 53 54 55 5 6 57

Figurei8. IF-G, Total iron %, XRF r e s u l t s

10 1 ' 1

x iy

52 53 54

-1-----

55 56 57 58

Figure 9. IF-G, Tqtal i ron X, AAS r e s u l t s

:: 12 1- l: 1 6

I 4 i

L- I , . . . I . . I .

A

c

-li 5 5 . 5

Figure 10. IF-G, Total iron %, Ti t r imet r ic results

Page 6: Looking at Analytical Data

268

Table 7. IF-G, main methods o f analysis

Method XRF ,US Vol

n - X

8

dT b2

M GM 12A

75;' 3":

20 19

55.98 56.18 1.03 0.50

-2.72 -0.23 1 1 . 1 ; 3.27

56.14 56.15 56.15 56.17 56.19 56.19

55.15 56.19 56.07 56.07

20

55.85 1.23

-0.82

4.59

55.86 55.90 55.97 55.92 55.83 -

19 15 55.03 56.10 C.95 0.63 0.23 0.38 3.15 2.45

55.88 55.97 55.96 56.00 55.99 56.02

55.99 56.02 55.83 55.78

15

55.79 0.59

-1.13 3.56

56.02 55.96 55.99

55.97 56.02

15 55.91 0.54 -0.87

3.09

56.04 56.01 56.01

55.99 56.02

14 55.99 0.43

-0.62

2.87

56.05 55.06 55.04

56.03 56.02 -

SO0 600 100 BOO 900 1000

1000 2000 3000 4000 5000 6000

Figure 11. SOIL-5, Manganese ppm, a l l r e s u l t s

.-

Figure i2. r e s u l t

. . . .

500 1000 1500

SOIL-5, Manganese ppm, minus one high

Figure 13. r e s u l t s , one low

SOIL-5, Manganese ppm, minus f o u r high

Table 8. SOIL-5, Mn ppm

calcium in SOIL-5

Table 9 shows t h e r e s u l t s and methods used by 1 2 l a b o r a t o r i e s f o r calcium. The r a n g e can be s e e n b e s t i n F igure 14. The compi le rs of t h e d a t a g i v e a "for in format ion only" mean of 2.20% with a s t a n d a r d error of ,+ 0.28. However, t h e r e are r e s u l t s by f i v e l a b o r a t o r i e s u s i n g three d i f f e - r e n t methods w i t h i n t h e range 2.34-2.52%. The p r o b a b i l i t y of t h e t r u e va lue be ing w i t h i n t h i s r a n g e i s ext remely h i g h and t h e median of 2.4975 would have been a much b e t t e r c h o i c e as an e s t i m a t o r . T h i s f i g u r e l ies o u t s i d e t h e s t a n d a r d error of t h e m e a n . A v a l u e of 2.50% could be s a f e l y recommended.

&eodm~r i n SOIL-5

The lowest and h i g h e s t v a l u e s i n Table 10 have been e l i m i n a t e d and a mean of 29.9 ppm h a s been recommended w i t h q u a l i f i e d conf idence only . However, w i t h s i x l a b o r a t o r i e s i n e x c e l l e n t agreement, t h e d a t a should be very r e l i a b l e . S i n c e a l l s i x l a b o r a t o r i e s used n e u t r o n a c t i v a - t i o n a n a l y s i s , t h e r e i s t h e p o s s i b i l i t y o f a method b i a s . Otherwise, t h e va lue i s u n l i k e l y t o be wrong.

Page 7: Looking at Analytical Data

269

Table 9. SOIL-5, Ca%

Method R e s u l t

NAA 0.248 XRF 0.983 XRF 1.303 XRF 1.630 NAA 2.340 XRF 2.495 NAA 2.500 N A A 2.500 AAS 2.520 NAA 2.975 XRF 3.300 AAS 3.600

8

T I

I : I(

I(

6

.i

r(

-?-+----+-- 1.5 2 2.5

. . , .

3

Figure 14. SOIL-5, Calcium %

Table 10. SOIL-5, Nd ppm

; -n - 3

-L .

I 3.5

Mercury in SOIL-5

l h e d a t a from e leven l a b o r a t o r i e s are g i v e n i n Table 11. The o r i g i n a l compi le rs have e l i - minated t h e lowest and h i g h e s t t o g i v e a mean of 0.79 ppm. I n F igure 15, s i n c e t h e h i g h e s t r e s u l t is a gross o u t l i e r , it h a s been e l i m i n a t e d so as

n o t t o d i s t o r t t h e appearance of t h e rest - as d e s c r i b e d above - t h e lowest r e s u l t being r e t a i n e d . There is, a p p a r e n t l y , one group o f d a t a around 0.55 ppm and another above 0.9 ppm and any a t t e m p t t o e v a l u a t e them i n terms of a t r u e va lue would be misleading. There i s no ready explana- t i o n f o r t h e two groups of d a t a but i t i s impor tan t t o n o t e t h e i r e x i s t e n c e and n o t t o g i v e a mean t h a t f a l l s between them.

' . I"?

1 F! ."5C

* 0

* * -

0 2 0 4 0 6

Figure 15 . SOIL-5, Mercury pprn

Nickel in SP-3

w 0 8 I

The 31 d a t a i n Table 12 are taken from a v a l u a b l e compi la t ion by Abbey (111, who a l s o employs them as an example i n t h e use of d i f f e r e n t methods o f e v a l u a t i o n . I n s p e c t i o n of t h e s e d a t a s u g g e s t s t h a t t h e va lue o f 125 ppm i S a gross o u t l i e r b u t bo th F igure 16, b e f o r e i ts e l i m i n a t i o n and Figure 17, a f t e r w a r d s , show a d i s t r i b u t i o n which does n o t appear normal. It is p o s i t i v e l y skewed and t h e shape is t e n d i n g towards t h a t of t h e lognormal d i s t r i b u t i o n i n

Page 8: Looking at Analytical Data

270

which t h e shape is normal i f logarithms of t h e da t a are p l o t t e d . This can be c h a r a c t e r i s t i c of some trace-element a n a l y s i s due t o inhomogeneity - one sub-sample may have one mineral g r a i n conta in ing t h e element, another , three, and so on, g iv ing a wide spread with l i t t l e agreement. If t h e logarithms of t h e data are p l o t t e d , wi th o r without t h e high value of 125 ppm, as i n F igures 18 and 19, t h e r e is some improvement to the shape, p a r t i c u l a r l y i n t h e la t ter case , though it is st i l l some way from t h e charac- t e r i s t i c elongated S-shape of a normal d i s t r i - bution. Skewness and k u r t o s i s are much improved, however.

Table 12. SY-3, N i ppm

Figure 16, SY-3, Nickel ppm, a l l results

Whether t h e 125 ppm r e s u l t is spur ious or a va l id ana lys i s of t h a t p a r t i c u l a r sub-sample, it is d i f f i c u l t to say. I n a normal d i s t r i b u t i o n it would be anomalous, i n a lognormal d i s t r i b u t i o n it would be noth ing ou t of t h e ord inary . The a r i thme t i c mean is 16 but most o t h e r estimates give a value of 11 ppm or less, t h e DCM a t 7.9,

being t h e lowest. An unbiased e f f i c i e n t estimate o f t h e lognormal mean (41, is about 14.5 or, e l imina t ing t h e s i n g l e high r e s u l t , 12.2 ppm. The s i t u a t i o n is indeed confusing and t h e r e is probably no ready explana t ion . If these were geochemical explora t ion sample ana lyses , then t h e average n i c k e l conten t of t h e rock would probably be about 14 ppm bu t t he re is too much v a r i a t i o n f o r t h e material t o be s u i t a b l e as a trace metal s tandard .

10

I

4 '1

t '1 4

rt

I I

'1 '1 " f . I . . . . . , . . . *

f 1.

, . . . ' " " ' ~ ' . ' . 3 15 20 25 50

Figure 1 7 . SY-3, Nickel ppm, one result omitted

30

20

15

-I . . . .

'1 % *

rt

rt

'1 '1

rt

t , , . , .

1 - . . . . .

* '1

4 '1 4

rt

. . . . . I

2 2 .5 5 3.1 4 4.5

Figure 18. SY-3, Nickel ppm. logarithms of results

DISCUSS I ON

The above examples show t h a t w e do no t a l w a y s examine our da ta aa c a r e f u l l y as we should. Sometimes, re -appra isa l confirms t h e o r i g i n a l i n t e r p r e t a t i o n , sometimes it merely adds confusion t o an a l r eadF ambiguous s i t u a t i o n . Mom than once, however, re-appraisal has . s u b s t i t u t e d a d i f f e r e n t value or r e j e c t e d any eva lua t ion as inappropr ia te .

Page 9: Looking at Analytical Data

271

2 1 . 5 3 3.5

Figure 19. SY-3, Nickel, logarithms o f results, one omitted

The wri ter makes no claim t h a t t h e methods t h a t have been used are n e c e s s a r i l y t h e b e s t b u t ra ther t h a t t h e d a t a have been s t u d i e d more c l o s e l y t h a n may sometimes happen. It is unders- tandable t h a t workers sometimes process l a r g e numbers of d a t a mechanical ly b u t t h i s .is a d i s s e r v i c e t o t h e a n a l y s t s who have provided them. Two r e s u l t s of 5 and 10% may be processed t o g i v e a mean o f 7.5 and a s t a n d a r d d e v i a t i o n o f 3.5 b u t t h i s has very l i t t l e meaning w i t h i n t h e c o n t e x t of a r e f e r e n c e material.

Non-normal distributions

A s a g e n e r a l r u l e , d a t a obta ined i n an i n t e r - l a b o r a t o r y c e r t i f i c a t i o n programme as w e l l as r e p l i c a t e a n a l y s e s w i t h i n one l a b o r a t o r y w i l l fo l low an approximately normal d i s t r i b u t i o n . Trace elements , however, may t e n d towards t h e lognormal, a s i s t h e case wi th n i c k e l i n SY-3. This i s l i k e l y if t h e element i n q u e s t i o n i s p r e s e n t i n s p o r a d i c d i s c r e t e g r a i n s - as i n a l l u v i a l go ld . Under t h e s e c i rcumstances , agree- ment between de termina t ions is u n l i k e l y unless large sub-samples are taken f o r a n a l y s i s . If a trace element i s d i f f u s e d i n t h e c r y s t a l l a t t i c e of a minera l , t h i s problem is l i k e l y t o be' less pronounced.

Another d i s t r i b u t i o n t h a t may be met i n t h e e v a l u a t i o n of d a t a is t h e n e g a t i v e lognormal, which i s n e g a t i v e l y skewed. According t o Koch and Link (lo), t h i s may occur when o b s e r v a t i o n s are n e a r t o an upper l i m i t set by some n a t u r a l p r o c e s s , f o r example, i r o n i n a hemat i te or l e a d i n a lead concent ra te . Presumably, under t h e s e c i rcumstances , where c o n c e n t r a t i o n is near a n upper l i m i t , t h e r e are g r e a t e r numbers o f low than h i g h d a t a , c a u s i n g the n e g a t i v e skewness. Although t h e writer has n o t encountered t h i s d i s t r i b u t i o n , it is as w e l l t o be aware t h a t it can occur . I r o n i n IF-G ( F i g u r e 7 ) , which is n e g a t i v e l y skewed, is c e r t a i n l y t e n d i n g t h i s way.

F i g u r e 20 i l l u s t r a t e s t h e shape of a t y p i c a l lognormal d i s t r i b u t i o n . It r e p r e s e n t s n i n e hypo- t h e t i c a l d a t a vary ing between 1 and 54 ppm. If t h e figure i s turned through 180°, it t h e n r e p r e s e n t s a n e g a t i v e lognormal d i s t r i b u t i o n of n i n e d a t a from 55.54 t o 60.90%.

1 . . . . . . . . ; . . 10

. . . . . . . . . . . . . . .

4

. ; . , - I

i

. . . . . . . . . . . . I

20 30

Figure 20. Lognormal and negative lognormal distributions

Rounding Of dot8

Figure 4 is a good i l l u s t r a t i o n of a n o t h e r d i f f i c u l t y i n t h e e v a l u a t i o n of d a t a . There are groups of de te rmina t ions a t 1.42, 1.43, 1.44% and so on, caused by t h e rounding up o r down of AAS r e a d i n g s . I t c a n only be assumed t h a t there is an e q u a l p r o b a b i l i t y of e i t h e r o c c u r r i n g and t h a t , for i n s t a n c e , i f t h e r e are ten r e s u l t s of 0.14 and t e n r e s u l t s of 0.15%, t h e n it is probable t h a t t h e t r u e r e s u l t l i e s n e a r e r t o 0.145% t h a n t o e i t h e r 0.14 o r 0.15. Rounding can also concea l t r u e i n t r a - l a b o r a t o r y v a r i a n c e - t w o results of 0.143 and 0.136, when rounded show no var iance , whereas 0.143 and 0.146, show increased v a r i a n c e .

Nlrbers of d e t e d o r t i o n 8

The r e - e v a l u a t i o n s above have been c a r r i e d o u t on d a t a of d i f f e r e n t types . T h i s was n o t commented upon l e s t i t made t h e c e n t r a l i s s u e of d a t a a p p r a i s a l u n n e c e s s a r i l y complicated. I n some cases, t h e d a t a c o n s i s t e d of the o r i g i n a l a n a l y s e s as r e t u r n e d by p a r t i c i p a t i n g labo- r a t o r i e s , whereas i n o t h e r s , t h e r e s u l t s from each l a b o r a t o r y have been averaged t o g i v e a l a b o r a t o r y mean. Both methods have t h e i r advan- tages and d isadvantages .

I t is clear t h a t e q u a l weight w i l l be g i v e n t o d a t a i f numbers of de te rmina t ions from i n d i v i d u a l l a b o r a t o r i e s are t h e same. I n refe- r e n c e material programmes, i t is u s u a l fo r t h e o r i g i n a t o r s t o r e q u e s t p a r t i c i p a t i n g l a b o r a t o r i e s t o return a s p e c i f i e d number of r e s u l t s - BGS. for example, normally r e q u e s t s four. T h i s r e q u e s t

Page 10: Looking at Analytical Data

272

is rarelj complied wi th by a l l l a b o r a t o r i e s . A commercial l a b o r a t o r y , f o r i n s t a n c e , w i l l o f t e n f u r n i s h one averaged r e s u l t i n accordance w i t h t h e u s u a l p r a c t i c e t o c l i e n t s . Some l a b o r a t o r i e s , on t h e o t h e r hand, w i l l r e t u r n a long s t r i n g of d a t a which may cause problems. If t h e l a b o r a t o r y i s of h igh r e p u t e i n t h e a n a l y s i s of a p a r t i c u l a r e lement , no harm may be done b u t i f t h e l a b o r a t o r y i s less e x p e r t o r is exper iment ing wi th a new technique o r a n unfamiliar e lement , it can cause d i f f i c u l t i e s .

Table 13 shows some h y p o t h e t i c a l d a t a from f i v e f i c t i t i o u s l a b o r a t o r i e s i n o r d e r t o i l l u s - t ra te a number of p o i n t s . The first f o u r are provid ing what is a p p a r e n t l y t h e b e s t estimate of a t r u e va lue b u t t h e i r d a t a are overwhelmed by t h e t e n de te rmina t ions from Laboratory N o 5. T h i s may be c o r r e c t e d by averaging t h e r e s u l t s from each l a b o r a t o r y o r , a l t e r n a t i v e l y , by r e t a i n i n g a l l d a t a from t h e first four and p a r t i a l l y averaging t h e d a t a from t h e f i f t h s o t h a t they t o t a l f o u r . I n t h i s c o n t r i v e d example, t h e r e s u l t s from Laboratory N o 5 would b e r e j e c t e d by most c r i t e r i a for t h e i d e n t i f i c a t i o n o f o u t l i e r s , s o t h a t they do n o t p r e s e n t a s e r i o u s problem b u t , i f t h e y averaged, s a y , 10.5%, t h e y would probably n o t be r e j e c t e d by any c r i t e r i a and, u n l e s s d e a l t wi th i n some way, would weight t h e d a t a towards too h igh a n estimate. T h i s shows a g a i n t h e importance o f examining t h e d a t a and t h e S-curves i n t h e l i g h t o f a l l a v a i l a b l e informat ion .

If w e now look a t t h e effects o f averaging t h e data from L a b o r a t o r i e s Nos 1 t o 4 , t h e mean, i n a l l Gases, is 10.00%, s o t h a t by doing t h i s , w e have e l i m i n a t e d a l l s i g n s of i n t r a - l a b o r a t o r y v a r i a t i o n , producing a s t a n d a r d d e v i a t i o n of zero . A s imple way of overcoming t h i s problem is t o average t h e i n d i v i d u a l l a b o r a t o r y s t a n d a r d d e v i a t i o n s which g i v e s 0.0807 - n o t v e r y d i f f e r e n t from t h e d e v i a t i o n of t h e 20 i n d i v i d u a l r e s u l t s , 0.0816.

D i f f e r e n t o r g a n i s a t i o n s t a c k l e t h i s problem i n d i f f e r e n t ways. Some, average r e s u l t s from each l a b o r a t o r y and t a k e a s t a n d a r d d e v i a t i o n o f t h e averaged d a t a , w h i l e o t h e r s p r o c e s s each i n d i v i d u a l r e s u l t . Another method t h a t may be used (8,151, is a n a n a l y s i s of v a r i a n c e technique t o s e p a r a t e between- and within- l a b o r a t o r y v a r i a t i o n . The writer makes u s e of -each i n d i - v i d u a l r e s u l t i n t h e p r o c e s s i n g o f data b e l i e v i n g t h a t , t o average d a t a b e f o r e p r o c e s s i n g is , t o some e x t e n t , t o s u p r e s s informat ion . Never- t h e l e s s , should any l a b o r a t o r y provide long series of d a t a , t h e r e w i l l be a check t o ensure t h a t they do n o t r e s u l t i n any wrong conclus ions . It is u n l i k e l y t h a t any e v a l u a t i o n w i l l encounter data as anomalous a s t h o s e i n Table 13 b u t it is always w i s e t o check. It seems i n e v i t a b l e t h a t , i f t h e o r i g i n a t o r of a r e f e r e n c e material r e c e i v e s , s a y , s i x r e s u l t s f o r t e n e lements from each o f 80 l a b o r a t o r i e s , some s i m p l i f i c a t i o n of t h e v a s t amounts of d a t a w i l l be n e c e s s a r y .

Table 13. CU%

lab. No. 1 Lab. No. 2 Lab. No. 3 Lab. No. 4 Lab. No. 5 ~~~~

9.80 9.9 10.0 9.98 12.0 12.0

10.10 9.95 10.0 9.99 12.1 12.0

10.10 10.02 9.95 9.97 12.0 12.1

10.00 10.13 10.05 10.06 12.2 11.9

12.1 12.2

F I N A L CONMENTS

S - d i s t r i b u t i o n c u r v e s , first brought t o t h e wri ter ' s n o t i c e i n t h e assessment o f t h e rock s t a n d a r d s , G-l and W - 1 (16). are a s imple and e f f e c t i v e means of looking a t d a t a b e f o r e and d u r i n g process ing . T h e i r va lue lies i n the fact t h a t d a t a are d e p i c t e d i n an e n t i r e l y unaf fec ted manner - no cumula t ive p e r c e n t a g e s , n o class i n t e r v a l s t o be chosen - merely t h e data, p l o t t e d s e q u e n t i a l l y from t h e lowest t o the h i g h e s t . They are v a l u a b l e whatever method of data assessment is f i n a l l y chosen b u t , e s p e c i a l l y so, when t h e r e are large q u a n t i t i e s of multi-element data and it becomes i n e v i t a b l e t h a t they must be submit ted t o some "number crunching" program i n which impor tan t anomalies might be los t t o view.

ACKNOWLEDGEMENTS

It is impor tan t t h a t i n t e r - l a b o r a t o r y d a t a obta ined i n r e f e r e n c e material programmes are e a s i l y a c c e s s i b l e t o o t h e r s and t h e writer is g r a t e f u l t o j o u r n a l s such as "Geostandards Newsletter" that t h i s is so. T h i s paper is publ i shed w i t h t h e approval of t h e D i r e c t o r , B r i t i s h Geological Survey (NERC).

RESUME

Les echantillons minerais de reference pr6- pares par l e "British Geological Survey" ont ete reevalues avec l a disponibilite d'une nouvelle serie de programmes statistique. A cette occa- sion, un certain nombre d'autres echantillons de reference prepares par d'autres organisms ont 6te egalement reexamines. On arrive I l a con- clusion suivante: l a diff icult6 de manipuler un grand nombre de donnees empeche un examen approfondi des donq6es ,et ceci peut crnduire I une mauvaise evaluation,de donnges. D'autres problSmes rencontres lors de 1 'evaluation de resultats inter-laboratoires sont discutes. I 1 es t sugger-6 qu'une des Sthodes efficaces pour examiner des donnees de compilation e s t de construire une courbe de distribution-S en les alignant sequentiellement.

Page 11: Looking at Analytical Data

2 73

REFERENCES

( 3 )

(4)

( 5 )

( 7 )

B. Lister and M.J. Gallagher (1970) An inter-laboratory survey of the accuracy of ore analysis, Trans Inst. %in. Metall., Sect. 8 , 79: B 213-237.

B. Lister (1978) The preparation of twenty ore standards, 1:GS 20 - 39. Preliminary work and assessment of analytical data, Geostandards Newsletter, 7: 157-186.

R. Lister (1982) Evaluation of analytical data: a practical guide for geoanalysts, Geostandards Newsletter, 6: 175-205.

J. Gastwirth (1966) On robust procedures, J. Am. Stat. Assoc.. 61: 929-948.

D . F . Andrews et a1 (1972) Robust estimates of location: survey and advances, Princeton University Press, 3738.

P.J. Ellis, I. Copelowitz and T.W. Steele (1977) Estimation of the mode by the dominant cluster method, Geostandards Newsletter, 1: 123-130.

R. Lister (1977) Second inter-laboratory survey of the accuracy of ore analysis, T r a n s Inst. Min. Metall., Sect. R, 86: B133-148.

H . F . Steger and W.S. Bowman (1987) MP-la: A certified .reference ore, Canmet Report 82-14E, 3 3 p .

B. Lister and J. van der Linden ( I n press) The preparation of Bougainville copper concentrate as reference material. IGS 45.

(10) S. Abbey, C.R. McLeod and Wan Liang-Guo (1983) FeR-1, FeR-2, FeR-3 and FeR-4: Four Canadian iron-forma- tion samples prepared for use as reference materials, Geol. Surv. of Canada Paper 83-19, 51p.

(11) S. Abbey (1983) Studies in "standard samples" of silicate r o c k s and minerals 1969 1982, Geol. Surv. of Canada Paper 83-15, 114p.

(12) K . Govindaraju (1984) Report (1984) on two GIT-IWG geochemical reference samples: albite from Italy, AL-I and iron formation sample from Greenland, IF&, Geost,andards Newsletter, 8: 63-113.

(13) R. Dybczyhski,, A. TugEavul and 0. Suschny (1979) Soil-5, a new IAEA certified reference material for trace element determinations, Geostandards Newsletter. 3: 61-87.

(14) G.S. Koch and R.F. Link (1970) Statistical analysis of geological data, Vol. I, Wiley, 375p.

(15) J. Mandel and R.C. Paule (1970) Interlaboratory evaluation of a material with unequal numbers of replicates. Analytical Chemistry, 42, 1194-1197.

(16) R.E. Stevens et a1 (1960) Second report on a cooperative investigation of the composition of two silicate r o c k s , Bull. U . S . Geol. Surv., no. 1113, 126p.