Upload
chi-zhang
View
212
Download
0
Embed Size (px)
Citation preview
Journal of Environmental Sciences Supplement (2009) S154–S157
A new water quality assessment model based on projection pursuit technique
ZHANG Chi1, DONG Sihui2,∗
1. School of Civil and Hydraulic Engineering, Dalian University of Technology, Dalian 116024, China. E-mail: [email protected]. School of Environment Science and Engineering, Dalian Jiaotong University, Dalian 116028, China
AbstractA new water quality assessment model was built based on projection pursuit technique. A great quantity of sample data was applied
to increase the model’s precision. A new genetic algorithm combined with conditional optimization method was proposed and applied
to the model optimization, which could deal with global optimization problem with various restrictions effectively. The case study
shows that this model can give an appropriate assessment of water quality. Moreover, it can determine the index weights in an objective
way or provide information for decision makers, which is difficult for other assessment methods.
Key words: water quality; assessment model; projection pursuit; genetic algorithm
Introduction
Water quality assessment is an important part of en-
vironment management and decision making. For water
quality assessment, a water quality classification model
should be build firstly, and then use the model to evaluate
the water quality level according to the obtained index
values. Water quality level is reflected by all of the influen-
tial indexes synthetically. In order to obtain an appropriate
water quality assessment, multi-index assessment model
must be established.
Until now, there is no any universal water quality
assessment index system or assessment criterion in China.
Water quality assessment models were proposed based on
ANN (Luo et al., 2004; Jiang et al., 2007), fuzzy pattern-
recognition (Tian et al., 2005), combined weight method
(Jin et al., 2004), and projection pursuit method (Zhang
et al., 2000). Among above models, none of them could
determine the objective weights while taking decision
makers’ bias to some indexes into account, or the precision
of the model was low because of the lack of a great quantity
of samples.
A new water quality assessment model was approached
based on projection pursuit technique in this article. A
great quantity of sample data were used to the model,
which were generated according to the water quality cri-
terion. The model made linear projection with all indexes
of samples and classified the samples according to the
projection scores in terms of the optimal projection di-
rection vector. Optimal projection direction vector reflects
the importance of indexes. Objective index weights can be
calculated by means of projection direction vector, which
is difficult to be realized with other assessment methods.
Furthermore, the model can assess samples according to
* Corresponding author. E-mail: [email protected]
decision makers’ bias.
1 Assessment model based on projection pur-suit
The basic idea of projection pursuit (Friedman and
Turkey, 1974; Bobbie et al., 2005) is to project data
from high-dimension to low-dimension in accordance with
certain reconstruction rules; to scale the possibility of a
certain structure exposed by the projection with regard
to projection objective function; to find out the optimal
projection direction vector of the objective function; to
analyze the high-dimension data structure characteristics
with projection scores (Michael, 2005). The analyzer
can classify the samples by observing projection figure
directly. The outcome is perspicuous and the operation
is convenient. The optimization for projection objective
function is the key to the model application, which is
a restricted multivariable problem and is difficult to be
resolved with traditional optimization method or con-
ventional genetic algorithm. The genetic algorithm was
combined with traditional optimization method to solve the
optimization problem in this article.
The steps of the construction and application of water
quality assessment model are as follows:
(1) Construction projection data.
Samples were generate according to water quality crite-
rions. That is to say, we generate a group of samples in each
level area stochastically. The level area represents the area
in which all the indexes belong to the same level. Every
sample is composed of water quality index values Xi j and
corresponding real level Yi. Xi j (i = 1, 2,..., n; j = 1, 2, ..., m)
denotes the value of index j in sample i, n is the number of
samples and m is the number of indexes. Yi denotes that the
real level of sample i. The more serious the water pollution
Suppl. A new water quality assessment model based on projection pursuit technique S155
is, the higher the water quality level is. Here, set the least
pollution water level as 1, next as 2 analogically. Because
of the differences in index values and units, it is necessary
to normalize Xi j to xi j, as a result, xi j falls into interval
(0, 1). For a bigger the worse index, the Eq. (1) should be
used, and for the smaller worse index, the Eq. (2) should
be used.
xi j =Xi j − Xj min
Xj max − Xj min
(1)
xi j =Xj max − Xi j
X j max − Xj min
(2)
where, Xj min and Xj max are the minimum and the maxi-
mum of water quality index j of all samples respectively.
(2) Construction of the objective function for projection.
Supposed p = (p1, p2, ..., pm) is a normalized projection
vector. In order to project xi1, xi2, ..., xim linearly to the
variable in one dimension, the following formula should
be used:
zi =
m∑
j=1
p jxi j , i = 1, 2, · · · , n (3)
where, zi is projection score. The samples can be classified
in terms of one dimension distribution figure of projection
dots zi (i = 1, 2, ..., n). The general judging rules are as
follows:
(a) If projection dots of some samples in distribution
figure are closer than others, they probably belong to the
class. The closer they are, the better the projection is.
(b) If all projection dots congregate clearly into some dot
groups in distribution figure, it shows that the projection is
satisfactory.
Therefore, the projection objective function can be built
as follows:
Q = s(P) × d(P) (4)
where, s(P) reflects a distance of classes, i.e. the standard
dispersion, and d(P) reflects the density in every class.
s(p) =
⎛⎜⎜⎜⎜⎜⎝n∑
i=1
(zi − z)2 / (n − 1)
⎞⎟⎟⎟⎟⎟⎠1/2
(5)
d(P) =
n∑
i=1
n∑
j=1
(R − ri j
)× f(R − ri j
)(6)
where, z̄ is the average of series z1, z2,..., zn and R is density
window radius. The choice of R should meet the following
demands: the number of dots included in a window should
be big enough; the value R should not increase too quickly
along with n augmentation. After good amount of trial and
error, we chose R = 0.1s(P). The distance ri j = (i, j = 1,
2, ..., n). f (R–ri j) is unit leap function. When R–ri j � 0,
f (R–ri j) = 1, otherwise, it equates 0.
(3) Optimization of the objective functions for projec-
tion.
When water quality indexes are ascertained, the pro-
jection function Q only changes with projection direction
P varying. Projection direction P represents data struc-
ture characteristics in samples. There must have a best
projection direction P* that exposes certain structure char-
acteristics of high-dimension data probably. Therefore, the
best project direction can be resolved by optimizing the
following model:
Qmax = s(P) × d(P) (7)
s.t.
m∑
j=1
p2j = 1 (8)
If decision makers have bias to some indexes, correspond-
ing restrictions for those indexes can be added to the
model. For example, if decision makers suggest the sixth
index is the most important, i.e. the value of P in direction
p6 is the biggest. It can be realized by adding following
restriction to the model:
p6 � p j ( j = 1, 2, 3, ...,m; j � 6) (9)
Above optimization model is a complex and non-linear
optimization problem with a number of restrictions. It is
difficult to resolve it with traditional optimization meth-
ods or conventional genetic algorithm. Genetic algorithm
(Bobbi et al, 2005; Ifarragaerri and Chang, 2000) can
resolve global optimization problem effectively. However,
because of the optimization with restrictions, the conver-
gence rate of conventional genetic algorithm is usually
very slow and there are big fluctuations in the process
of optimization (Chipperfield and Fleming, 1995). Tradi-
tional optimization method is capable of resolving local
optimization problem with restrictions. Thereby, in this
article, a synthetic optimization method was presented
and used to resolve the model optimal solutions, which
combined genetic algorithm with traditional optimization
method.
The basic idea of the method is that the solutions of
populations and corresponding objective values in genetic
algorithm are replaced with local optimum solutions of
initialized solutions and corresponding objective values
optimized by traditional optimization method. It is to say,
for an initialized solution generated by genetic algorithm,
it is taken for an initial solution for iteration in a traditional
optimization method such as gradient method or Newton’s
method. As a result, a local optimal solution is obtained
corresponding to the initial solution. The local optimal
solution and its objective value are given back to genetic
algorithm to replace the corresponding solution and its
objective value.
(4) Achievement of index weights.
In step (3), the optimal projection direction P* reflects
the importance of indexes. Because P* is a unit vector and
p21 + p2
2+ ...+ p2m = 1, p2
1, p22, .., p2
m can be taken as index
weights, i.e., W = (p21 + p2
2+ ...+ p2m). If there are only
objective function (7) and restriction (8), i.e., there are no
other restrictions such as restriction (9) in the model, p21 +
p22+ ...+ p2
m can be taken as index weights in objective way.
(5) Classification for the samples
Projection score zi can be calculated by substituting
the optimal P* into Eq. (3). The larger projection score
S156 ZHANG Chi et al. Suppl.
of a sample is, the bigger the water quality level of the
sample is. The classification for the samples can be made
according to the differences of zi. Therefore, we can get the
relation figure about the projection score and classification
level of all samples, which makes projection score as
horizontal coordinates and classification level as vertical
coordinates.
(6) Assessment for the objective samples
Compute the projection scores of the objective samples
with the optimal P* and the index values of objective
samples. The classification levels of objective samples
were calculated through interpolation according to the
relation figure in step (5).
2 Case study
The data in Table 1 are the groundwater quality moni-
toring outcomes from 5 monitoring sites in Fuzhou, China
in May, 1999. According to the characteristics of this
area, total hardness, nitrate, nitrite, sulfate, hyperman-
ganate and volatile phenol were chosen as assessment
indexes. The groundwater quality assessment criterions
were from Groundwater Quality Criterions (GB/T14848-
93). The specific data are listed in Table 2.
If the quantity of samples is too small, the outcome,
i.e., the optimal projection vector, will vary with the ran-
domicity of samples. Study result showed that generating
150 samples in every level area could make the outcome
steady. Two hundred samples were generated in each level
area stochastically in this article, and every sample was
composed of the six index values and the real level value.
Because the water quality was classified into 5 levels
according to the criterions in Table 2, 1000 samples were
obtained. The index values were normalized with Eq. (1)
or (2).
First, the bias of decision maker was not taken into
account, i.e. there was only restriction (8) in the optimal
model. The optimal solution computed by the arithmetic
in this article was P* = (0.2358, 0.3689, 0.5111, 0.3077,
0.4641, 0.4868). Thereby, the objective weights of 6 in-
Table 1 Monitoring outcomes of 5 monitoring sites in Fuzhou, China
Site Total Nitrate Nitrite Sulfate Hyper Volatile
hardness (mg/L) (mg/L) (mg/L) manganate phenol
(mg/L) (mg/L) (mg/L)
A 145.32 1.76 0.014 74.18 1.98 0.014
B 98.28 13.12 0.005 107.28 2.04 0.016
C 122.19 2.11 0.02 48.67 3.81 0.009
D 51.12 11.08 0.021 43.97 2.18 0.001
E 144.07 4.75 0.026 20.59 1.88 0.001
dexes can be calculated and W = (0.0561, 0.1361, 0.2612,
0.0947, 0.2154, 0.2370). The projection score zi (i = 1, 2,
..., 1000) can be computed by substituting P* into Eq. (3).
Draw a figure about the sequence number of sample and
their scores are shown in Fig. 1. From Fig. 1, we know that
all the samples can be classified into 5 classification levels.
The level ranges of projection scores are (0.025–0.0929),
(0.1460–0.2325), (0.3062–0.4771), (0.6036–1.0873) and
(1.3178–2.1648), respectively. Here, make the classifica-
tion level as 1 which projection score lies in the least level
range, next as 2 analogically. We find that the classification
levels of all the 1000 samples are the same as the real levels
entirely, that is to say, the accuracy of the model to samples
is 100%. The projection score and classification level
figure are shown in Fig. 2, which makes projection scores
as horizontal coordinates and levels as vertical coordinates.
Compute the projection scores of the objective samples
and the results are 0.4980, 0.5996, 0.4198, 0.2273, and
0.2037 respectively. We can get the levels of 5 objective
samples through interpolation according to Fig. 2 are 3.17,
3.97, 3.00, 2.00 and 2.00 respectively. The order of water
pollution degree in 5 monitoring sites is in the order B >A > C > D > E. The result with ANN method (Luo et al.,2004; Jiang et al., 2007) was also in the order of B > A >C > D > E. But the groundwater quality was classified into
4 levels in that article. That is to say, the level 4 and level
5 of groundwater quality criterions are combined to level 4
Fig. 1 Projection scores vs. sequence number of samples.
Fig. 2 Projection score vs. classification level in Fig. 1.
Table 2 Groundwater quality assessment criterions
Level Total hardness Nitrate Nitrite Sulfate Hypermanganate Volatile
(mg/L) (mg/L) (mg/L) (mg/L) (mg/L) phenol (mg/L)
1 � 150 � 2 � 0.001 � 50 � 1 � 0.001
2 � 300 � 5 � 0.01 � 150 � 2 � 0.0015
3 � 450 � 20 � 0.02 � 250 � 3 � 0.002
4 � 550 � 30 � 0.1 � 350 � 10 � 0.01
5 > 550 > 30 > 0.1 > 350 > 10 > 0.01
Suppl. A new water quality assessment model based on projection pursuit technique S157
in Table 2.
From the objective weights achieved from the projec-
tion model, we know that the most important index is
“nitrite” and the least one is “total hardness”. However,
different area has different water quality demand. Suppos-
ing there is a strict control to “volatile phenol” in this
area. Therefore, decision makers may suggest the index
“volatile phenol” be the most important. The decision
makers’ opinion can be taken into account by adding some
restrictions to the model, i.e., p6 > p j (j = 1, 2, 3, 4,
5). Through optimization, the optimal projection vector
P’ = (0.2361, 0.3683, 0.4995, 0.3069, 0.4641, 0.4995),
and corresponding weight vector W’ = (0.0558, 0.1357,
0.2495, 0.0942, 0.2154, 0.2495). The samples were also
classified into 5 classification level. The level ranges of
projection scores were (0.0225– 0.0933), (0.1466–0.2327),
(0.3064–0.4769), (0.6046–1.0868), and (1.3163–2.1656)
respectively. The projection score and classification level
figure are shown in Fig. 3. The projection score of the 5
objective samples were 0.5060, 0.6093, 0.4243, 0.2265 and
0.2028 respectively. The levels of the 5 objective samples
were 3.23, 4.00, 3.00, 2.00 and 2.00 respectively.
Fig. 3 Projection score and classification level in Fig. 2.
3 Conclusions
(1) A new water quality classification method (pro-
jection pursuit method based on genetic algorithm) was
brought forward in this article. The method applied a great
quantity of sample data generated according to ground-
water quality criteria, which avoided the low precision
because of little quantity of samples in some other models.
The model not only can determine the objective index
weights for the samples, but also can make evaluation of
water quality according to the decision makers’ bias.
(2) Genetic algorithm was combined with traditional op-
timization method, which made genetic algorithm resolve
optimization problem with various restrictions effectively.
The case study shows that the optimization workload is
decreased greatly.
(3) The model was applied to the groundwater quality
assessment of 5 monitoring sites in Fuzhou. The case study
shows that this model can give appropriate water quality
assessment.
Acknowledgments
This work was supported by the National Natural
Science Foundation of China (No. 20776023) and the
Natural Science Foundation of Dalian Government (No.
2007J23JH015).
References
Bobbie J M, Webb R, Kristin H J, Scott D H, Christian P, Bob W
W, 2005. An improved optimization algorithm and a bayes
factor termination criterion for sequential projection pur-
suit. Chemometrics and Intelligence Laboratory Systems,
77(1-2): 149–160.
Chipperfield A J, Fleming P J, 1995. The MATLAB genetic algo-
rithm toolbox, applied control techniques using MATLAB.
IEE Colloquium.
Friedman J H, Turkey J W A, 1974. Projection pursuit algorithm
for exploratory data analysis. IEEE Trans on Computer,
19(4): 224–227.
Ifarragaerri A, Chang C I, 2000. Unsupervised hyperspectral
image analysis with projection pursuit. IEEE Trans onGeoscience and Remote Sensing, 38(6): 2529–2538.
Jiang B Q, Wang W S, Wen X C, 2007. An improved BP neural
networks model on water quality evaluation. ComputerSystems Applications, 9: 46–50.
Jin J L, Huang H M, Wei Y M, 2004. Comprehensive evaluation
model for water quality based on combined weights. Jour-nal of Hydroelectric Engineering, 23(3): 13–19.
Luo D G, Wang X J, Guo Q, 2004. The application of ANN real-
ized by MATLAB to underground water quality assessment.
Acta Scicentiarum Naturalum Universitis Pekinesis, 40(2):
296–302.
Michael D S, 2005. Statistical Modeling of High-Dimensional
Nonlinear Systems: A Projection Pursuit Solution. Atlanta:
Georgia Institute of Technology Press. 13–88.
Tian J H, Qiu L, Chai F X, 2005. Application of fuzzy recog-
nition in comprehensive evaluation of water quality. ActaScientiae Circumstantiae, 25(7): 950–953.
Zhang X L, Ding J, Li Z Y, 2000. Application of new projection
pursuit algorithm in assessing water quality. China Environ-mental Science, 20(2): 187–189.