Espiritu 4.pdf

Embed Size (px)

Citation preview

  • 8/10/2019 Espiritu 4.pdf

    1/14

    Analytica Chimica Acta 462 (2002) 87100

    Chemometrics characterisation of the quality of river water

    Darinka Brodnjak-Voncina a,, Danilo Dobcnika, Marjana Novic b, Jure Zupan b

    a Faculty of Chemistry and Chemical Engineering, University of Maribor, Smetanova 17, SI-2000 Maribor, Sloveniab National Institute of Chemistry, Ljubljana, Slovenia

    Received 7 March 2002; accepted 3 April 2002

    Abstract

    Within theperiod fromautumn 1990 to spring1999 (from October to April in each period) 207samples were collectedand the

    measurement of 19 physical and chemical variables of theMura river, Slovenia,were carriedout. These variables are: river flow,

    water temperature, air temperature, dissolved oxygen, deficit of oxygen, oxygen saturation index, chemical oxygen demand

    (COD) in unfiltered and filtered samples, and biochemical oxygen demand after 5 days (BOD5) in unfiltered and filtered

    samples, pH, conductivity, ammonium, nitrite, nitrate, and phosphate concentrations, adsorbable organic halogens (AOX),

    dissolved organic carbon (DOC), and suspended solids. For handling the results of all measurements different chemometrics

    methods were employed: (i) the basic statistical methods for the determination of mean and median values, standard deviations,

    minimal and maximal values of measured variables, and their mutual correlation coefficients, (ii) the principal component

    analysis (PCA), and (iii) the clustering method based on Kohonen neural network. The influences of season, month, sampling

    site, and sampling time on the pollutant levels were examined. Before 1993, the pulp and paper industry was the main

    source of pollutants because of large amounts of chlorine emission as a consequence of industrial treatment, the leaching ofcellulose. After the year 1993, the technology was changed and the quality of the river water has improved. The improvement

    could be detected 1 year after the change of technology. For one part of water samples the river quality classes based

    on biological parameters were also determined. The correlation between the biologically determined quality classes and

    chemical measurements was sought. Consequently, the biological classification for the water samples based on the chemical

    analyses was studied. 2002 Elsevier Science B.V. All rights reserved.

    Keywords: Water quality; Mura river; Principal component analysis; Classification

    1. Introduction

    The physical and chemical studies on surface water,namely, the Mura river water have been performed

    during a joint SlovenianAustrian project since 1966.

    Here, we present the data collected in the time period

    19901999. Through this period the quality of the

    water was followed and every year the classification

    has been made. The classification was done regarding

    chemical parameters and biological analyses. There

    Corresponding author. Tel.: +386-2-229-44-32;

    fax: +386-2-252-77-74.

    are four main biological classes: class I, unpolluted to

    very slightly pollutedoligosaprobic; class II, moder-

    ately pollutedbeta-mesosaprobic; class III, heavilypollutedalpha-mesosaprobic and class IV, exces-

    sively pollutedpolysaprobic. Additionally, there are

    three subclasses of quality of water between the main

    four classes; III for slightly polluted, IIIII for crit-

    ically polluted, and IIIIV for very heavily polluted

    water. In the reported time period, all the samples

    from the Mura river were classified into one of the

    following three classes: class II, moderately polluted;

    classes IIIII, critically polluted or class III, heavily

    polluted, river quality classes. Since the biological

    0003-2670/02/$ see front matter 2002 Elsevier Science B.V. All rights reserved.

    P I I : S 0 0 0 3 - 2 6 7 0 ( 0 2 ) 0 0 2 9 8 - 2

  • 8/10/2019 Espiritu 4.pdf

    2/14

    88 D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100

    investigations are time consuming, it would be worth-

    while to find a way to reduce the number of biological

    investigations and replace them by chemical analy-

    ses. The aim of this work is to find the correlationbetween biological classes and variables obtained by

    chemical measurements.

    Chemometrics methods have been often used for the

    classification and comparison of different samples[1].

    Some examples are, for instance, the differentiation of

    rainwater compositional data by principal component

    analysis (PCA)[2], application of chemometric tech-

    niques to the analysis of Suquia River water quality

    [3],identification of sources of bottom waters in the

    Weddel Sea by PCA and target estimation[4],deter-

    mination of correlation of chemical and sensory data

    in drinking waters by factor analysis[5],to name just

    a few. Chemometrics methods have been used also for

    evaluating environmental data of Lagoon water [6],

    San Francisco Bay and Estuary[7],and Muggia Bay

    in Northern Adriatic Sea [8]. They were used also

    for the oceanographic characterisation of northern Sao

    Paulo cost [9]. Partial least squares (PLS) was per-

    formed for simultaneous spectrophotometric determi-

    nation of calcium and magnesium in water[10]. PCA

    and PLS were used for the characterisation of wastew-

    ater in Australia (Melbourne)[11]. An example of us-

    ing Kohonen maps is given in a paper discussing theunsupervised training, clustering and classification of

    multivariate biological data[12].

    The quality of the water of the Mura river was

    studied through 9 years (during nine seasons), from

    1990/1991 to 1998/1999. The measurements were per-

    formed from October till April, each week at the same

    day, at the same time (h), and at the same sampling

    site. During summer months sampling was not carried

    out because of the excessive river flow causing high

    water level. Additionally, twice a year the samples

    were collected at four different sites along the riverstream. The first sampling site was in the middle of

    the river Mura, in Spielfeld (Austria). The other three

    sites were situated 10 km downstream, in the middle

    and on both Mura riverbanks, the left one being in

    Austria (Bad Radkersburg), and the right one in Slove-

    nia (Gornja Radgona). Altogether 19 variables were

    measured for all 207 samples collected and analysed

    during this period. The measurements were alternately

    performed by two institutions, Faculty of Chemistry

    and Chemical Engineering, Maribor, Slovenia, and

    Amt der Steiermarkischen Landesregierung, Graz,

    Austria. Twice a year, in October and in February,

    sampling by both participating institutions was made

    and both results were compared. The average valueswere used in further treatments.

    2. Experimental

    2.1. Sampling

    A standard method was used for sampling [13].

    Water was collected in polyethylene bottles 0.5 m

    below the surface at four sample sites along the river

    stream. All glass and plastic ware used for samplingand analyses were rinsed with milli-Q water. Filtra-

    tion was made through glassfibre filters. All mea-

    surements were performed the day as samples were

    gathered.

    2.2. Instruments and reagents

    Ion chromatograph Dionex and Perkin-Elmer spec-

    trophotometer were used for some ion determinations,

    Dohrmann apparatus for AOX and TOC determina-

    tions, WTW conductivity meter and WTW oximetre

    were employed for the corresponding analyses. All

    reagents were analytical grade. The milli-Q system

    was used for purifying the water.

    2.3. Analytical methods

    Standard methods were used for determination of

    DOC [14], AOX [15], COD [16], BOD5 [17], sus-

    pended solids[18],and pH measurement[19].

    2.4. Spectrophotometric determinations

    The samples were filtered before the beginning of

    analyses. Ammonium was determined by reaction

    of ammonium with salicylate and hypochlorite ions

    in the presence of sodium nitrosopentacyanoferrate

    [20], nitrate by the 2,6-dimethylphenol method [21]

    and also with ion chromatography[22],nitrite by the

    reaction of NO2 ions with sulphanylamyd yield-

    ing intensively coloured diazonium salt [23], and

    also with ion chromatography, and orthophosphate

  • 8/10/2019 Espiritu 4.pdf

    3/14

    D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100 89

    was measured by the ammonium molybdate method

    [24]. The absorbencies were measured at max of the

    particular component.

    2.5. Data analysis

    The 207 samples are characterised by 19 physico-

    chemical variables: river flow, water temperature,

    air temperature, dissolved oxygen, deficit of oxygen,

    oxygen saturation index, chemical oxygen demand

    (COD) in unfiltered and filtered samples, and bio-

    chemical oxygen demand after 5 days (BOD5) in

    unfiltered and filtered samples, pH, electrical con-

    ductivity, ammonium, nitrite, nitrate, and phosphate

    concentrations, adsorbable organic halogens (AOX),

    dissolved organic carbon (DOC), and suspendedsolids (seeTable 1). The enumerated variables are the

    components of the vector representation of each sam-

    ple which is used in further chemometric analysis.

    The results of all measurements have been investi-

    gated by different chemometric methods[1]:the basic

    statistical methods for the determination of mean and

    median values, standard deviations, minimal and max-

    imal values of measured variables and their mutual

    correlation coefficients. The PCA[1,25]and artificial

    neural networks [26] were applied for grouping of

    water samples due to measured variables. Among dif-ferent neural networks the Kohonen self organising

    maps[27]are the most suitable for clustering, while

    the counterpropagation artificial neural networks (CP

    ANNs) are good as modelling method [26,2831].

    All the calculations and plots in the following (PCA)

    section were done with the Teach/Me software [25]

    using Teach/Me data analysis option which is one of

    the applications of the Teach/Me system, providing

    very flexible tools for most fields of data analysis.

    3. Results and discussion

    3.1. Statistical screening of data

    After determining mean and median values, and

    standard deviation, the mutual correlation was sought

    for all measured variables. The estimation of the pol-

    lution should not depend on the river flow. As most

    of the measured variables, except for the pH, temper-

    atures, oxygen saturation, and conductivity are related

    to the river flow, they were scaled by multiplication

    with the river flow values given in m3 s1. The units

    of scaled variables were correspondingly changed. In

    this way, the river flow is eliminated leaving only 18variables in the vector representation of the samples.

    The maximal correlation coefficient of the scaled data

    was found between measurements of deficit of oxygen

    and saturation oxygen index (R = 0.96), which is ob-

    vious because the later one is obtained from a known

    relationship between the oxygen saturation at differ-

    ent temperatures and dissolved oxygen. Large corre-

    lations (R > 0.84) were found between filtered and

    unfiltered values of COD and filtered and unfiltered

    values of BOD5, as expected.

    The AOX variable shows the greatest change in wa-

    ter quality in the past 9 years. The pulp and paper in-

    dustry was the main source of pollutants because of

    large amounts of chlorine emission as a consequence

    of industrial treatment, the leaching of cellulose. Af-

    ter the year 1993, the technology was changed; in the

    new leaching process the chlorine was omitted and the

    quality of the river water has improved. The improve-

    ment could be detected no sooner than 1 year after

    the change of technology. A good correlation of AOX

    with the sampling time settled by the translation of

    the 9-year period into days is evident fromFig. 1. The

    important fall of AOX variable after the year 1994 andconsequently the improvement of quality of water is

    shown inFig. 1.

    The number of micro-organisms which are respon-

    sible for self cleaning of water has been lowering

    all the time until the year 1995. The changed indus-

    trial procedure of cellulose leaching helps to improve

    the water quality so the number of micro-organisms

    started growing in 1995. BOD5 values also in-

    creased. It is evident that the river needed 2 years for

    self-remediation.

    3.2. Principal component analysis (PCA)

    It was performed in order to get an overall im-

    pression about the correlation of 207 water samples,

    described with physical and chemical variables, with

    the quality of water in different seasons, months, or

    sampling sites. Original data depending on waterflow

    were multiplied with individual waterflow values as

    described above, paragraph statistical screening of

    data. PCA was applied on the matrix composed of

  • 8/10/2019 Espiritu 4.pdf

    4/14

  • 8/10/2019 Espiritu 4.pdf

    5/14

  • 8/10/2019 Espiritu 4.pdf

    6/14

  • 8/10/2019 Espiritu 4.pdf

    7/14

  • 8/10/2019 Espiritu 4.pdf

    8/14

  • 8/10/2019 Espiritu 4.pdf

    9/14

    D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100 95

    Fig. 1. Plot of the normalised AOX variable and sampling time settled by translation of the 9 year period into days. The important fall of

    this parameter and consequently the improvement of water quality after the year 1994 is evident. Samples are numbered from 1 to 207.

    207 18 elements. A total of 207 rows represent wa-

    ter samples composed of 18 variables. Data was addi-

    tionally pre-processed on two different ways. First the

    column centring of the data was used, what means

    that the mean value of each column was subtracted

    from individual (207) elements. Second, the autoscal-ing of individual variables was performed, called col-

    umn standardisation. With this procedure the mean

    of the column elements is subtracted from individual

    elements and divided by the column standard devia-

    tion. Consequently, each column has zero mean and

    unit variance. The percentages of variances in resulting

    eigenvectors (PCs) for both types of pre-processing of

    the data is shown inTable 2.

    FromTable 2, it can be seen that using column cen-

    tred data, 99.8% of variance is gathered in the first two

    PCs. However, analysing the composition of the first

    and the second PCs it was found out that almost all of

    variance is that of AOX (variable 18, v 18 inTable 1).

    Consequently, there would not be much different to

    analyse plots of samplesv18 (AOX) againstv7 (COD)

    or v

    17 (suspended solids), which are the two secondmost informative variables. For this reason, only the

    PCA using autoscaled variables was further analysed.

    With the autoscaled variables, 49.5% of total variance

    was achieved in the first two principal components.

    Any conclusion on the basis of plots shown in the

    space of PC1 and 2 would neglect >50% of total infor-

    mation about the data. Some rough indications from

    the obtained distribution of transformed samples were

    derived anyway, however, for further evaluation of the

    water samples other chemometrical methods, such as

  • 8/10/2019 Espiritu 4.pdf

    10/14

    96 D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100

    Table 2

    Comparison of variances in PCA using two different scaling modes,

    column centring of data (m = 0.0) and autoscaling (m = 0.0,

    s = 1.0)

    PC Column centring

    of data

    Column standardisation

    (autoscaling) of data

    Variance (%) Total Variance (%) Total

    1 99.42 99.42 35.47 35.47

    2 0.38 99.80 13.99 49.46

    3 0.10 99.90 10.83 60.29

    4 0.04 99.94 7.50 67.79

    5 0.02 99.96 5.50 73.29

    6 0.02 99.98 4.59 77.89

    7 0.01 99.99 3.83 81.72

    8 0.01 100.00 3.63 85.35

    9 0.00 100.00 2.91 88.26

    10 0.00 100.00 2.74 91.0011 0.00 100.00 2.24 93.25

    12 0.00 100.00 1.98 95.23

    13 0.00 100.00 1.52 96.75

    14 0.00 100.00 1.11 97.85

    15 0.00 100.00 0.70 98.55

    16 0.00 100.00 0.56 99.12

    17 0.00 100.00 0.46 99.58

    18 0.00 100.00 0.42 100.00

    Kohonen and counterpropagation ANNs were imple-

    mented.

    InFig. 2, the biplot resulting from PCA of the watersamples represented with 18 variables is shown. It can

    be seen that the first component, PC1, is associated

    with a group of variables such as nitrite, nitrate con-

    centrations, phosphate, suspended solids, AOX, COD,

    and BOD. The second component PC2 represents

    mainly the dependence on temperature (variables 1

    and 2, printed bold inFig. 2, correspond in Table 1

    to v2 and v3, respectively). It is evident from Fig. 2

    that samples separated from the main central cluster

    and distributed in the region of larger values of PC1

    were all collected before the year 1994 (sample labels

  • 8/10/2019 Espiritu 4.pdf

    11/14

    D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100 97

    Fig. 2. Biplot (scores and loadings) of 207 samples and 18 variables in the PC12 co-ordinate system for water samples of river Mura.

    The sample numbers from 1 to 207 are given in Table 1, while the original variables (118, printed bold in the biplot) forming the PC1

    and 2 components are defined in Table 1 as v219 because the water flow (the variable v1) was previously eliminated (explained in the

    paragraph statistical screening of data).

    ANN was trained for 240 epochs, which was sufficient

    for a satisfactory recognition of the training samples.

    The 18 components of each samples vector repre-

    sentation are physico-chemical variables describedin Section 2. The maximal and minimal correction

    factors in the modelling procedure were 0.4 and 0.01,

    respectively. The prediction results of the 56 training

    samples are shown inFig. 3.

    In Fig. 3, the regression line between the experimen-

    tal and predicted biological class numbers of training

    samples is shown. The standard deviation of prediction

    residuals, SEP = 0.247, and the correlation coeffi-

    cientR = 0.958 prove that the CP ANN model trained

    with 56 samples describes a good correlation between

    18-component vector representation of samples

    (physicochemical properties) and biological classes.

    The constructed model was tested with remaining

    151 samples (out of 207) for which the biologicalclass was not known nor determined experimentally.

    Since there is no information about experimental bi-

    ological classes for these 151 samples, the quality of

    the prediction results can not be confirmed. However,

    the trend of improving the water quality assessed by

    the biological classification of 56 training samples is

    obvious. The resulting predictions with respect to the

    sampling year are shown inFig. 4.

    The biological classes predicted for 151 samples

    show the same trend of improvement of the quality

  • 8/10/2019 Espiritu 4.pdf

    12/14

    98 D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100

    Fig. 3. Regression line of predictions of 56 training samples with constructed CP ANN model. A and B are the estimated parameters,

    intercept and slope, of the regression line. Their standard errors are also given. S.D. is estimated standard deviation of the fitting, and R

    the correlation coefficient between the experimental and predicted biological classes.

    Fig. 4. The prediction of biological class numbers of 151 samples using CP ANN model. The samples are discriminated by the year in

    which they were gathered.

  • 8/10/2019 Espiritu 4.pdf

    13/14

    D. Brodnjak-Voncina et al. / Analytica Chimica Acta 462 (2002) 87100 99

    of water as observed for the 56 training samples. The

    biological investigations are time consuming in com-

    parison to the determination of physico-chemical pa-

    rameters and already a rough prediction of biologicalclass numbers is helpful.

    4. Conclusions

    The study has given us the opportunity to follow

    all processes involved in the complex system of sur-

    face water pollution. The time series on overall pol-

    lution levels as well as results of specific measuring

    parameters are important indicators and can be used

    for planing short term and long term preventive action.

    In this work, standard multivariate statistical methods

    and PCA was used for pre-screening of the data. It

    was shown that it is necessary to use autoscaled vari-

    ables. From the results, it was concluded that the PCA

    method is not discriminant enough since the variables

    are weakly correlated. Less than 50% variance is ex-

    plained in the first two principal components. For the

    classification of this kind of data the non-linear meth-

    ods such as artificial neural networks are more suit-

    able. The artificial neural networks were implemented

    as the method for clustering of all 207 water samples

    as well as for the predictions of biological classes. Theanalysis has shown that AOX content is the parame-

    ter with the greatest discriminating power. The results

    obtained from the evaluation of data gathered during

    the 9-years monitoring of Mura river water confirmed

    that the improvement of the quality of water during the

    last 9 years is significant and, therefore, the Austrian

    Project for improving the quality of rivers can be con-

    sidered as successful.

    One of the goals of the research presented in

    this work was to find correlation between biologi-

    cal classes and chemical parameters. Because of thetime-consuming biological analyses, only a small

    amount of water samples were chosen for the pro-

    cedure of determination of biological classes. The

    experience-based CP ANN model was built using the

    water samples for which the biological activity was

    known. By the constructed model the rest of the sam-

    ples were examined to obtain the prediction of bio-

    logical activity. The predicted values were in the same

    range as training samples values; besides, from the

    predicted biological activities the trend of the water

    quality improvement was evident. Although the usual

    validation procedures to estimate the quality of the

    model were not applicable because of low number of

    available training samples, the overview of predictionresults indicates that the biological activity obtained

    from the proposed model is of significant value in the

    case that the experimental values are not available.

    Acknowledgements

    The authors thank the Ministry of Education,

    Science and Sport of Republic of Slovenia, con-

    tract numbers P1-0507-0104, and P1-0508-0104 for

    financial support. The Amt der Steiermarkischen

    Landesregierung, Graz, Austria, is kindly acknowl-

    edged for completing the data about Mura river water

    samples with their results.

    References

    [1] D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De

    Jong, P.J. Lewi, J.S. Verbeke, Handbook of Chemometrics

    and Qualimetrics: Part A, Elsevier, Amsterdam, 1997.

    [2] P. Zhang, N. Dudley, A.M. Ure, D. Littlejohn, Anal. Chim.

    Acta 258 (1992) 110.

    [3] W.D. Alberto, D.M. Del Pilar, A.M. Valeria, P.S. Fabiana,H.A. Cecilia, B.M. De Los Angeles, Water Res. 35 (2001)

    28812894.

    [4] R. Lindegren, M. Josefson, Chemometr. Intell. Lab. Syst. 44

    (1998) 403409.

    [5] A.K. Meng, I.H. Suffet, Environ. Sci. Technol. 31 (1997)

    337345.

    [6] E. Marengo, M.C. Gennaro, D. Giacosa, C. Abrigo, G. Saini,

    M.T. Avignone, Anal. Chim. Acta 317 (1995) 5363.

    [7] W.M. Jarman, G.W. Johnson, C.E. Bacon, J.A. Davis, R.W.

    Risebrough, R. Ramer, Fresenius J. Anal. Chem. 359 (1997)

    254260.

    [8] P. Barbieri, G. Adami, A. Favretto, E. Reisenhofer, Fresenius

    J. Anal. Chem. 361 (1998) 349352.

    [9] M.M.C. Ferreira, C.G. Faria, E.T. Paes, Chemometr. Intell.Lab. Syst. 47 (1999) 289297.

    [10] J.B. Marzo, M.J.M. Hernandez, S. Sagrado, E. Bonet, R.

    Gimenes, J. Chemometr. 12 (1998) 323336.

    [11] M.P. Kallio, S.P. Mujunen, G. Hatzimihalis, P. Koutoufides,

    P. Minkkinen, P.J. Wilkie, M.A. Connor, Anal. Chim. Acta

    393 (1999) 181191.

    [12] M.F. Wilkins, L. Boddy, C.W. Morris, Binary-Comput.

    Microb. 6 (1994) 6472.

    [13] Water Quality-SamplingPart 11: Guidance on Sampling of

    Ground Waters, ISO 5667-11: 1992 (E).

    [14] Water Quality, Guidelines for the Determination of Total

    Organic Carbon (TOC), ISO 8245: 1987 (E).

  • 8/10/2019 Espiritu 4.pdf

    14/14