4
Journal of Environmental Sciences Supplement (2009) S154–S157 A new water quality assessment model based on projection pursuit technique ZHANG Chi 1 , DONG Sihui 2, 1. School of Civil and Hydraulic Engineering, Dalian University of Technology, Dalian 116024, China. E-mail: [email protected] 2. School of Environment Science and Engineering, Dalian Jiaotong University, Dalian 116028, China Abstract A new water quality assessment model was built based on projection pursuit technique. A great quantity of sample data was applied to increase the model’s precision. A new genetic algorithm combined with conditional optimization method was proposed and applied to the model optimization, which could deal with global optimization problem with various restrictions eectively. The case study shows that this model can give an appropriate assessment of water quality. Moreover, it can determine the index weights in an objective way or provide information for decision makers, which is dicult for other assessment methods. Key words: water quality; assessment model; projection pursuit; genetic algorithm Introduction Water quality assessment is an important part of en- vironment management and decision making. For water quality assessment, a water quality classification model should be build firstly, and then use the model to evaluate the water quality level according to the obtained index values. Water quality level is reflected by all of the influen- tial indexes synthetically. In order to obtain an appropriate water quality assessment, multi-index assessment model must be established. Until now, there is no any universal water quality assessment index system or assessment criterion in China. Water quality assessment models were proposed based on ANN (Luo et al., 2004; Jiang et al., 2007), fuzzy pattern- recognition (Tian et al., 2005), combined weight method (Jin et al., 2004), and projection pursuit method (Zhang et al., 2000). Among above models, none of them could determine the objective weights while taking decision makers’ bias to some indexes into account, or the precision of the model was low because of the lack of a great quantity of samples. A new water quality assessment model was approached based on projection pursuit technique in this article. A great quantity of sample data were used to the model, which were generated according to the water quality cri- terion. The model made linear projection with all indexes of samples and classified the samples according to the projection scores in terms of the optimal projection di- rection vector. Optimal projection direction vector reflects the importance of indexes. Objective index weights can be calculated by means of projection direction vector, which is dicult to be realized with other assessment methods. Furthermore, the model can assess samples according to * Corresponding author. E-mail: [email protected] decision makers’ bias. 1 Assessment model based on projection pur- suit The basic idea of projection pursuit (Friedman and Turkey, 1974; Bobbie et al., 2005) is to project data from high-dimension to low-dimension in accordance with certain reconstruction rules; to scale the possibility of a certain structure exposed by the projection with regard to projection objective function; to find out the optimal projection direction vector of the objective function; to analyze the high-dimension data structure characteristics with projection scores (Michael, 2005). The analyzer can classify the samples by observing projection figure directly. The outcome is perspicuous and the operation is convenient. The optimization for projection objective function is the key to the model application, which is a restricted multivariable problem and is dicult to be resolved with traditional optimization method or con- ventional genetic algorithm. The genetic algorithm was combined with traditional optimization method to solve the optimization problem in this article. The steps of the construction and application of water quality assessment model are as follows: (1) Construction projection data. Samples were generate according to water quality crite- rions. That is to say, we generate a group of samples in each level area stochastically. The level area represents the area in which all the indexes belong to the same level. Every sample is composed of water quality index values X ij and corresponding real level Y i . X ij (i = 1, 2,..., n; j = 1, 2, ..., m) denotes the value of index j in sample i, n is the number of samples and m is the number of indexes. Y i denotes that the real level of sample i. The more serious the water pollution

A new water quality assessment model based on projection pursuit technique

Embed Size (px)

Citation preview

Page 1: A new water quality assessment model based on projection pursuit technique

Journal of Environmental Sciences Supplement (2009) S154–S157

A new water quality assessment model based on projection pursuit technique

ZHANG Chi1, DONG Sihui2,∗

1. School of Civil and Hydraulic Engineering, Dalian University of Technology, Dalian 116024, China. E-mail: [email protected]. School of Environment Science and Engineering, Dalian Jiaotong University, Dalian 116028, China

AbstractA new water quality assessment model was built based on projection pursuit technique. A great quantity of sample data was applied

to increase the model’s precision. A new genetic algorithm combined with conditional optimization method was proposed and applied

to the model optimization, which could deal with global optimization problem with various restrictions effectively. The case study

shows that this model can give an appropriate assessment of water quality. Moreover, it can determine the index weights in an objective

way or provide information for decision makers, which is difficult for other assessment methods.

Key words: water quality; assessment model; projection pursuit; genetic algorithm

Introduction

Water quality assessment is an important part of en-

vironment management and decision making. For water

quality assessment, a water quality classification model

should be build firstly, and then use the model to evaluate

the water quality level according to the obtained index

values. Water quality level is reflected by all of the influen-

tial indexes synthetically. In order to obtain an appropriate

water quality assessment, multi-index assessment model

must be established.

Until now, there is no any universal water quality

assessment index system or assessment criterion in China.

Water quality assessment models were proposed based on

ANN (Luo et al., 2004; Jiang et al., 2007), fuzzy pattern-

recognition (Tian et al., 2005), combined weight method

(Jin et al., 2004), and projection pursuit method (Zhang

et al., 2000). Among above models, none of them could

determine the objective weights while taking decision

makers’ bias to some indexes into account, or the precision

of the model was low because of the lack of a great quantity

of samples.

A new water quality assessment model was approached

based on projection pursuit technique in this article. A

great quantity of sample data were used to the model,

which were generated according to the water quality cri-

terion. The model made linear projection with all indexes

of samples and classified the samples according to the

projection scores in terms of the optimal projection di-

rection vector. Optimal projection direction vector reflects

the importance of indexes. Objective index weights can be

calculated by means of projection direction vector, which

is difficult to be realized with other assessment methods.

Furthermore, the model can assess samples according to

* Corresponding author. E-mail: [email protected]

decision makers’ bias.

1 Assessment model based on projection pur-suit

The basic idea of projection pursuit (Friedman and

Turkey, 1974; Bobbie et al., 2005) is to project data

from high-dimension to low-dimension in accordance with

certain reconstruction rules; to scale the possibility of a

certain structure exposed by the projection with regard

to projection objective function; to find out the optimal

projection direction vector of the objective function; to

analyze the high-dimension data structure characteristics

with projection scores (Michael, 2005). The analyzer

can classify the samples by observing projection figure

directly. The outcome is perspicuous and the operation

is convenient. The optimization for projection objective

function is the key to the model application, which is

a restricted multivariable problem and is difficult to be

resolved with traditional optimization method or con-

ventional genetic algorithm. The genetic algorithm was

combined with traditional optimization method to solve the

optimization problem in this article.

The steps of the construction and application of water

quality assessment model are as follows:

(1) Construction projection data.

Samples were generate according to water quality crite-

rions. That is to say, we generate a group of samples in each

level area stochastically. The level area represents the area

in which all the indexes belong to the same level. Every

sample is composed of water quality index values Xi j and

corresponding real level Yi. Xi j (i = 1, 2,..., n; j = 1, 2, ..., m)

denotes the value of index j in sample i, n is the number of

samples and m is the number of indexes. Yi denotes that the

real level of sample i. The more serious the water pollution

Page 2: A new water quality assessment model based on projection pursuit technique

Suppl. A new water quality assessment model based on projection pursuit technique S155

is, the higher the water quality level is. Here, set the least

pollution water level as 1, next as 2 analogically. Because

of the differences in index values and units, it is necessary

to normalize Xi j to xi j, as a result, xi j falls into interval

(0, 1). For a bigger the worse index, the Eq. (1) should be

used, and for the smaller worse index, the Eq. (2) should

be used.

xi j =Xi j − Xj min

Xj max − Xj min

(1)

xi j =Xj max − Xi j

X j max − Xj min

(2)

where, Xj min and Xj max are the minimum and the maxi-

mum of water quality index j of all samples respectively.

(2) Construction of the objective function for projection.

Supposed p = (p1, p2, ..., pm) is a normalized projection

vector. In order to project xi1, xi2, ..., xim linearly to the

variable in one dimension, the following formula should

be used:

zi =

m∑

j=1

p jxi j , i = 1, 2, · · · , n (3)

where, zi is projection score. The samples can be classified

in terms of one dimension distribution figure of projection

dots zi (i = 1, 2, ..., n). The general judging rules are as

follows:

(a) If projection dots of some samples in distribution

figure are closer than others, they probably belong to the

class. The closer they are, the better the projection is.

(b) If all projection dots congregate clearly into some dot

groups in distribution figure, it shows that the projection is

satisfactory.

Therefore, the projection objective function can be built

as follows:

Q = s(P) × d(P) (4)

where, s(P) reflects a distance of classes, i.e. the standard

dispersion, and d(P) reflects the density in every class.

s(p) =

⎛⎜⎜⎜⎜⎜⎝n∑

i=1

(zi − z)2 / (n − 1)

⎞⎟⎟⎟⎟⎟⎠1/2

(5)

d(P) =

n∑

i=1

n∑

j=1

(R − ri j

)× f(R − ri j

)(6)

where, z̄ is the average of series z1, z2,..., zn and R is density

window radius. The choice of R should meet the following

demands: the number of dots included in a window should

be big enough; the value R should not increase too quickly

along with n augmentation. After good amount of trial and

error, we chose R = 0.1s(P). The distance ri j = (i, j = 1,

2, ..., n). f (R–ri j) is unit leap function. When R–ri j � 0,

f (R–ri j) = 1, otherwise, it equates 0.

(3) Optimization of the objective functions for projec-

tion.

When water quality indexes are ascertained, the pro-

jection function Q only changes with projection direction

P varying. Projection direction P represents data struc-

ture characteristics in samples. There must have a best

projection direction P* that exposes certain structure char-

acteristics of high-dimension data probably. Therefore, the

best project direction can be resolved by optimizing the

following model:

Qmax = s(P) × d(P) (7)

s.t.

m∑

j=1

p2j = 1 (8)

If decision makers have bias to some indexes, correspond-

ing restrictions for those indexes can be added to the

model. For example, if decision makers suggest the sixth

index is the most important, i.e. the value of P in direction

p6 is the biggest. It can be realized by adding following

restriction to the model:

p6 � p j ( j = 1, 2, 3, ...,m; j � 6) (9)

Above optimization model is a complex and non-linear

optimization problem with a number of restrictions. It is

difficult to resolve it with traditional optimization meth-

ods or conventional genetic algorithm. Genetic algorithm

(Bobbi et al, 2005; Ifarragaerri and Chang, 2000) can

resolve global optimization problem effectively. However,

because of the optimization with restrictions, the conver-

gence rate of conventional genetic algorithm is usually

very slow and there are big fluctuations in the process

of optimization (Chipperfield and Fleming, 1995). Tradi-

tional optimization method is capable of resolving local

optimization problem with restrictions. Thereby, in this

article, a synthetic optimization method was presented

and used to resolve the model optimal solutions, which

combined genetic algorithm with traditional optimization

method.

The basic idea of the method is that the solutions of

populations and corresponding objective values in genetic

algorithm are replaced with local optimum solutions of

initialized solutions and corresponding objective values

optimized by traditional optimization method. It is to say,

for an initialized solution generated by genetic algorithm,

it is taken for an initial solution for iteration in a traditional

optimization method such as gradient method or Newton’s

method. As a result, a local optimal solution is obtained

corresponding to the initial solution. The local optimal

solution and its objective value are given back to genetic

algorithm to replace the corresponding solution and its

objective value.

(4) Achievement of index weights.

In step (3), the optimal projection direction P* reflects

the importance of indexes. Because P* is a unit vector and

p21 + p2

2+ ...+ p2m = 1, p2

1, p22, .., p2

m can be taken as index

weights, i.e., W = (p21 + p2

2+ ...+ p2m). If there are only

objective function (7) and restriction (8), i.e., there are no

other restrictions such as restriction (9) in the model, p21 +

p22+ ...+ p2

m can be taken as index weights in objective way.

(5) Classification for the samples

Projection score zi can be calculated by substituting

the optimal P* into Eq. (3). The larger projection score

Page 3: A new water quality assessment model based on projection pursuit technique

S156 ZHANG Chi et al. Suppl.

of a sample is, the bigger the water quality level of the

sample is. The classification for the samples can be made

according to the differences of zi. Therefore, we can get the

relation figure about the projection score and classification

level of all samples, which makes projection score as

horizontal coordinates and classification level as vertical

coordinates.

(6) Assessment for the objective samples

Compute the projection scores of the objective samples

with the optimal P* and the index values of objective

samples. The classification levels of objective samples

were calculated through interpolation according to the

relation figure in step (5).

2 Case study

The data in Table 1 are the groundwater quality moni-

toring outcomes from 5 monitoring sites in Fuzhou, China

in May, 1999. According to the characteristics of this

area, total hardness, nitrate, nitrite, sulfate, hyperman-

ganate and volatile phenol were chosen as assessment

indexes. The groundwater quality assessment criterions

were from Groundwater Quality Criterions (GB/T14848-

93). The specific data are listed in Table 2.

If the quantity of samples is too small, the outcome,

i.e., the optimal projection vector, will vary with the ran-

domicity of samples. Study result showed that generating

150 samples in every level area could make the outcome

steady. Two hundred samples were generated in each level

area stochastically in this article, and every sample was

composed of the six index values and the real level value.

Because the water quality was classified into 5 levels

according to the criterions in Table 2, 1000 samples were

obtained. The index values were normalized with Eq. (1)

or (2).

First, the bias of decision maker was not taken into

account, i.e. there was only restriction (8) in the optimal

model. The optimal solution computed by the arithmetic

in this article was P* = (0.2358, 0.3689, 0.5111, 0.3077,

0.4641, 0.4868). Thereby, the objective weights of 6 in-

Table 1 Monitoring outcomes of 5 monitoring sites in Fuzhou, China

Site Total Nitrate Nitrite Sulfate Hyper Volatile

hardness (mg/L) (mg/L) (mg/L) manganate phenol

(mg/L) (mg/L) (mg/L)

A 145.32 1.76 0.014 74.18 1.98 0.014

B 98.28 13.12 0.005 107.28 2.04 0.016

C 122.19 2.11 0.02 48.67 3.81 0.009

D 51.12 11.08 0.021 43.97 2.18 0.001

E 144.07 4.75 0.026 20.59 1.88 0.001

dexes can be calculated and W = (0.0561, 0.1361, 0.2612,

0.0947, 0.2154, 0.2370). The projection score zi (i = 1, 2,

..., 1000) can be computed by substituting P* into Eq. (3).

Draw a figure about the sequence number of sample and

their scores are shown in Fig. 1. From Fig. 1, we know that

all the samples can be classified into 5 classification levels.

The level ranges of projection scores are (0.025–0.0929),

(0.1460–0.2325), (0.3062–0.4771), (0.6036–1.0873) and

(1.3178–2.1648), respectively. Here, make the classifica-

tion level as 1 which projection score lies in the least level

range, next as 2 analogically. We find that the classification

levels of all the 1000 samples are the same as the real levels

entirely, that is to say, the accuracy of the model to samples

is 100%. The projection score and classification level

figure are shown in Fig. 2, which makes projection scores

as horizontal coordinates and levels as vertical coordinates.

Compute the projection scores of the objective samples

and the results are 0.4980, 0.5996, 0.4198, 0.2273, and

0.2037 respectively. We can get the levels of 5 objective

samples through interpolation according to Fig. 2 are 3.17,

3.97, 3.00, 2.00 and 2.00 respectively. The order of water

pollution degree in 5 monitoring sites is in the order B >A > C > D > E. The result with ANN method (Luo et al.,2004; Jiang et al., 2007) was also in the order of B > A >C > D > E. But the groundwater quality was classified into

4 levels in that article. That is to say, the level 4 and level

5 of groundwater quality criterions are combined to level 4

Fig. 1 Projection scores vs. sequence number of samples.

Fig. 2 Projection score vs. classification level in Fig. 1.

Table 2 Groundwater quality assessment criterions

Level Total hardness Nitrate Nitrite Sulfate Hypermanganate Volatile

(mg/L) (mg/L) (mg/L) (mg/L) (mg/L) phenol (mg/L)

1 � 150 � 2 � 0.001 � 50 � 1 � 0.001

2 � 300 � 5 � 0.01 � 150 � 2 � 0.0015

3 � 450 � 20 � 0.02 � 250 � 3 � 0.002

4 � 550 � 30 � 0.1 � 350 � 10 � 0.01

5 > 550 > 30 > 0.1 > 350 > 10 > 0.01

Page 4: A new water quality assessment model based on projection pursuit technique

Suppl. A new water quality assessment model based on projection pursuit technique S157

in Table 2.

From the objective weights achieved from the projec-

tion model, we know that the most important index is

“nitrite” and the least one is “total hardness”. However,

different area has different water quality demand. Suppos-

ing there is a strict control to “volatile phenol” in this

area. Therefore, decision makers may suggest the index

“volatile phenol” be the most important. The decision

makers’ opinion can be taken into account by adding some

restrictions to the model, i.e., p6 > p j (j = 1, 2, 3, 4,

5). Through optimization, the optimal projection vector

P’ = (0.2361, 0.3683, 0.4995, 0.3069, 0.4641, 0.4995),

and corresponding weight vector W’ = (0.0558, 0.1357,

0.2495, 0.0942, 0.2154, 0.2495). The samples were also

classified into 5 classification level. The level ranges of

projection scores were (0.0225– 0.0933), (0.1466–0.2327),

(0.3064–0.4769), (0.6046–1.0868), and (1.3163–2.1656)

respectively. The projection score and classification level

figure are shown in Fig. 3. The projection score of the 5

objective samples were 0.5060, 0.6093, 0.4243, 0.2265 and

0.2028 respectively. The levels of the 5 objective samples

were 3.23, 4.00, 3.00, 2.00 and 2.00 respectively.

Fig. 3 Projection score and classification level in Fig. 2.

3 Conclusions

(1) A new water quality classification method (pro-

jection pursuit method based on genetic algorithm) was

brought forward in this article. The method applied a great

quantity of sample data generated according to ground-

water quality criteria, which avoided the low precision

because of little quantity of samples in some other models.

The model not only can determine the objective index

weights for the samples, but also can make evaluation of

water quality according to the decision makers’ bias.

(2) Genetic algorithm was combined with traditional op-

timization method, which made genetic algorithm resolve

optimization problem with various restrictions effectively.

The case study shows that the optimization workload is

decreased greatly.

(3) The model was applied to the groundwater quality

assessment of 5 monitoring sites in Fuzhou. The case study

shows that this model can give appropriate water quality

assessment.

Acknowledgments

This work was supported by the National Natural

Science Foundation of China (No. 20776023) and the

Natural Science Foundation of Dalian Government (No.

2007J23JH015).

References

Bobbie J M, Webb R, Kristin H J, Scott D H, Christian P, Bob W

W, 2005. An improved optimization algorithm and a bayes

factor termination criterion for sequential projection pur-

suit. Chemometrics and Intelligence Laboratory Systems,

77(1-2): 149–160.

Chipperfield A J, Fleming P J, 1995. The MATLAB genetic algo-

rithm toolbox, applied control techniques using MATLAB.

IEE Colloquium.

Friedman J H, Turkey J W A, 1974. Projection pursuit algorithm

for exploratory data analysis. IEEE Trans on Computer,

19(4): 224–227.

Ifarragaerri A, Chang C I, 2000. Unsupervised hyperspectral

image analysis with projection pursuit. IEEE Trans onGeoscience and Remote Sensing, 38(6): 2529–2538.

Jiang B Q, Wang W S, Wen X C, 2007. An improved BP neural

networks model on water quality evaluation. ComputerSystems Applications, 9: 46–50.

Jin J L, Huang H M, Wei Y M, 2004. Comprehensive evaluation

model for water quality based on combined weights. Jour-nal of Hydroelectric Engineering, 23(3): 13–19.

Luo D G, Wang X J, Guo Q, 2004. The application of ANN real-

ized by MATLAB to underground water quality assessment.

Acta Scicentiarum Naturalum Universitis Pekinesis, 40(2):

296–302.

Michael D S, 2005. Statistical Modeling of High-Dimensional

Nonlinear Systems: A Projection Pursuit Solution. Atlanta:

Georgia Institute of Technology Press. 13–88.

Tian J H, Qiu L, Chai F X, 2005. Application of fuzzy recog-

nition in comprehensive evaluation of water quality. ActaScientiae Circumstantiae, 25(7): 950–953.

Zhang X L, Ding J, Li Z Y, 2000. Application of new projection

pursuit algorithm in assessing water quality. China Environ-mental Science, 20(2): 187–189.