10

Click here to load reader

Robust Adaptive Partial Least Squares Modeling of a Full-Scale Industrial Wastewater Treatment Process

Embed Size (px)

Citation preview

Page 1: Robust Adaptive Partial Least Squares Modeling of a Full-Scale Industrial Wastewater Treatment Process

Robust Adaptive Partial Least Squares Modeling of a Full-Scale IndustrialWastewater Treatment Process

Hae Woo Lee, Min Woo Lee, and Jong Moon Park*

AdVanced EnVironmental Biotechnology Research Center, Department of Chemical Engineering/School ofEnVironmental Science and Engineering, POSTECH, San 31 Hyoja-Dong, Nam-Gu, Pohang, Kyungbuk790-784, Republic of Korea

A new scheme of robust adaptive partial least squares (PLS) method was proposed for the purpose of predictionand monitoring of an industrial wastewater treatment process that has highly complex and time-varying processdynamics. The essential feature of this method is that all incoming process data are preliminarily screened onthe basis of a combined monitoring index and each observation identified as an outlier is simply eliminated(hard threshold) or suppressed by using a weight function (soft threshold) prior to model update. To elucidatethe feasibility of the proposed scheme, various PLS modeling approaches, including conventional ones, wereevaluated and their results were compared with each other. While the conventional approaches clearly revealedtheir limitations such as the inflexibility of the model to process changes and the misleading model updateby high leverage outliers, most robust adaptive PLS approaches based on the proposed scheme exhibitedfairly good performances both in the prediction and monitoring aspects. Among the tested methods, the robustadaptive PLS method using Fair weight function showed the best performances, reasonably maintaining therobustness of the PLS model.

1. Introduction

Multivariate statistical process control (MSPC) has receivedconsiderable attention along with the rapid advances in on-linemonitoring and computer technology. In MSPC complex processbehaviors can be efficiently interpreted by compressing the highdimensional space of process variables into a low dimensionallatent variable space, retaining the essential information of rawdata. The partial least squares (PLS) method is one of the mostpopular MSPC techniques. It can effectively derive the relation-ship between the process input and output variables whichusually have strongly collinear and noisy characteristics. ThePLS method also provides powerful process monitoring tools.One can easily identify abnormal operations on the basis of thestatistic monitoring indices such as Hotelling’sT 2 and squaredprediction error (SPE). There have been several reports that thePLS method can be applied to the modeling and monitoring ofvarious chemical and biological processes.1-3 However, theconventional static PLS method has been also criticized for itsbasic assumption of steady state which is contradictory to thefact that most actual processes usually have a nonstationary andtime-varying dynamic nature.

To overcome this problem, adaptive PLS method has beenproposed. In the adaptive PLS method, the model is recursivelyupdated using newly incoming data so that slowly changingprocess behavior can be effectively reflected in the model. Whileseveral adaptive PLS algorithms have been proposed andsuccessfully applied to the modeling and monitoring of varioustime-varying processes,4-7 it has been pointed out that the PLSmodel could be seriously deteriorated when considerablenumbers of abnormal process data were used in the modelupdate procedure.5,8 Because the accuracy of the PLS model ishighly dependent on the statistical information contained in theoperation data, it is very crucial that only a data set representingthe relevant variance of normal process dynamics should be

used for the model update in order to maintain the robustnessof the PLS model.

The robustness problem has been already issued throughoutall MSPC techniques. MSPC techniques are basically derivedfrom a database usually containing some outliers that originatefrom sensor faults, missing values, process disturbances,malfunction of instruments, process shut-down, and so on. Theseoutliers can distort the distribution of multivariate data and oftenlead to a deceptive result. To minimize the adverse effect ofoutliers, several authors have proposed robust multivariatemethods. In robust multivariate methods, robust statistics suchas median and median absolute deviation are often used insteadof mean and variance, respectively. Rousseeuw9 introduced leastmedian squares method as an extension of the median meth-odology to obtain a robust regression model. Several robusttechniques including minimum volume ellipsoid, ellipsoidalmultivariate trimming, and minimum covariance determinationhave been also developed for the robust estimation of covariancematrix.9-11

For the static PLS model, Wakeling and Macfie12 proposeda simple and comprehensive robust PLS algorithm, so-called,iteratively reweighted PLS (IRPLS). This method used aregression residual to examine whether an observation is anoutlier or not and calculated weight values to suppress theoutliers in the model building step. The IRPLS has beenmodified by several authors and widely used as the mostrepresentative robust PLS method.13-15 According to ourliterature surveys, however, there have been very few reportsconcerning robust adaptive PLS method despite that therobustness problem is much more important in the adaptivemodel because the continuous incursion of incoming datawithout detection of outliers can seriously deteriorate the modelstructure.

In this study, an industrial anaerobic filter process that showshighly complicated dynamic behaviors was selected as the modelprocess, and a new scheme of robust adaptive PLS method wasproposed for the modeling and monitoring of the process. Toconfirm the feasibility of the proposed robust adaptive PLS

* To whom correspondence should be addressed. Tel.:+82-54-279-2275. Fax: +82-54-279-8299. E-mail: [email protected].

955Ind. Eng. Chem. Res.2007,46, 955-964

10.1021/ie061094+ CCC: $37.00 © 2007 American Chemical SocietyPublished on Web 12/30/2006

Page 2: Robust Adaptive Partial Least Squares Modeling of a Full-Scale Industrial Wastewater Treatment Process

method, the performances of a conventional static PLS16 andan adaptive PLS method4 were also evaluated and comparedwith each other.

2. Description of the Model Process

The model process is a full-scale down-flow anaerobic filterprocess to treat the wastewater discharged from a purifiedterephthalic acid manufacturing plant (Samsung PetrochemicalCo. Ltd., Ulsan, Korea). The detail schematic diagram of theprocess is shown in Figure 1. The anaerobic filter process wasdesigned for the preliminary conversion of organic pollutantsin wastewater into methane gas, reducing the organic loadingto the following activated sludge process. For the stableoperation of the anaerobic filter process, the organic loadingrate and the pH of the feed stream are manually controlled bychanging the flows of both high-strength wastewater and sodiumhydroxide added to the raw wastewater feed. The feed temper-ature is also controlled at 38°C by a cooling tower. The effluentis recycled to the front of the reactor for the purpose of mixingand dilution of the feed wastewater. An operation databaseconsisting of the online-measured variables shown in Table 1was available, which was automatically accumulated by a dataacquisition system (Honeywell, Morristown, NJ). Other detailedprocess descriptions can be found elsewhere.17

3. Model Identification

(3.1) General Modeling Approach.In the whole subsequentmodel identification processes, hourly average data sets with4369 total observations were used. All variables considered inthe model identification were classified intoX andY blocks, asshown in Table 1. The predictor blockX consisted of 10 online-measured variables and 5 additional variables that could becalculated from the online-measured variables. It was expectedthat the inclusion of the additional variables could enhance themodel performance, because they were closely related to theactual dynamics of the anaerobic filter process. The predictedblock Y consisted of the total oxygen demand (TOD) concentra-tion of effluent and the production rate of methane gas, whichare directly related to the performance of the model process.All data were used in the model identification after autoscaling.

To construct a dynamic PLS model, autoregressive with exo-genous inputs (ARX) and finite impulse response (FIR) model-ing approaches were considered. These approaches have beenwidely adopted in data-driven dynamic model identification.18-20

In the preliminary study, however, it was revealed that ARX

modeling approach tended to overemphasize the autoregressionterms, making the model insensitive to the process changes.4

Therefore, we finally adopted the FIR modeling approach forall subsequent model identification processes as follows:

wherex and y represent the input variable vector and outputvariable vector, respectively, andnx is a time lag for inputvariables.4,21 The applied value ofnx was 1, which wasdetermined by using Akaike’s information criteria.22,23 Themodel was designed to perform a one step ahead prediction forthe output variables. All programs for the model identificationwere implemented in MATLAB by using PLS toolbox.24

(3.2) Static PLS Model.A static PLS model was developedusing the nonlinear iterative partial least squares (NIPALS)algorithm, which is most widely used to determine the modelparameters such as score, loading, and weight vectors andregression coefficients.16 The first 1000 observations were usedfor model calibration, and the remainder was used for model

Figure 1. Schematic diagram of the industrial anaerobic filter process.

Table 1. Variables Used in the Modeling of an Industrial AnaerobicFilter Process

notation description

predictorvariable(X)

Qin flow rate of influent (m3/h)

TODin TOD of influent (mg/L)pHin pH of influentpHout pH of effluentTequ temperature of equalization tank (°C)Tfeed temperature of feed flow (°C)Qr recycle flow rate (m3/h)Qhigh federate rate of high strength wastewater (m3/h)Qgas production rate of biogas (m3/h)QNaOH Feed rate of sodium hydroxide solution (ton/h)rTODin

a actual TOD of influent (mg/L)) (QinTODin +QrTODout)/(Qin + Qr)

TODloada TOD loading rate (g of TOD/h)) QinTODin

CTa contact time (h)) (volume of reactor)/(Qin +Qr)

HRTa hydraulic retention time (h)) (volume ofreactor)/Qin

rTODloada actual TOD loading rate (g of TOD/h))

r(TODin)(Qin + Qr)predictedvariable(Y)

TODout TOD of effluent (mg/L)

QCH4 production rate of methane gas (m3/h)

a Variables were calculated from the measured variable on the basis ofthe knowledge of the process.

y(t+1) ) f(x(t), x(t-1), x(t-2), ...,x(t-nx)) (1)

956 Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007

Page 3: Robust Adaptive Partial Least Squares Modeling of a Full-Scale Industrial Wastewater Treatment Process

validation. To determine the optimum number of latent variables,leave-one-out cross-validation technique was used25 and 3 wasselected as the optimum value on the basis of the Wold’sRcriterion.26 To investigate the process monitoring ability of thestatic PLS model, two typical statistical indices,T 2 and SPE,were used.T 2 is a measure of variations in the principalcomponent subspace, whereas SPE is a measure of variationsin the residual subspace. During the construction of the PLSmodel, the T 2s and the SPEs for all observations werecalculated,1,2 and their confidence limits were also determinedfrom their distributions.27

(3.3) Adaptive PLS Model. Among various adaptive PLSmethods, we adopted the blockwise recursive PLS algorithmproposed by Qin4 as a basic skeleton because it is very efficientin updating the PLS model with respect to the computationalcost and memory. First, an initial PLS model was obtained bythe same way of constructing the static PLS model describedabove. Then, the PLS model was recursively updated with themoving window concept when an amount of newly incomingdata was available. In the original Qin’s algorithm, the maximumpossible number of latent variables, which corresponds to therank of X block, was used in the model update. Thereafter,however, it was revealed that this might result in a poorprediction ability of the updated model due to an overfittingproblem.28 To avoid this, we modified the updated model onemore time by applying the cross-validation technique after everymodel update procedure. The size of the moving window was1000, and the size of the subblock for the model update was100. These sizes of moving window and subblock were chosenby a heuristic approach because no fundamental guideline exists.

However, some discussions about this topic are available inliterature.7,28 For the application of the adaptive PLS methodto the process monitoring, we used the adaptive confidencelimits proposed by Wang et al.7 instead of the constant limitsdescribed previously in the static PLS model. Whenever a newobservation became available, the confidence limits ofT 2 andSPE were recalculated.

(3.4) Robust Adaptive PLS Model.Figure 2 represents thedetailed flow chart of the proposed robust adaptive PLS method.The overall scheme is very similar to that of the adaptive PLSmethod described previously. The distinct feature of the robustadaptive PLS method is that abnormal incoming processoperation data (outliers) are preliminarily screened to maintainthe robustness of the PLS model during the model update. Toscreen the outliers, we adopted the combined monitoring indexwhich had been first proposed by Yue and Qin:29

where æ is the combined monitoring index,δ2 is the 99%confidence limit of the SPE, andøl

2 is the 99% confidence limitof T 2. The distribution of the combined monitoring index canbe approximated byø2 distribution, and thus the statisticalconfidence limit (99%) of the combined monitoring index,ú2,also can be calculated.18 Each newly incoming observation canbe tested whether it is an outlier or not on the basis of itscorrespondingæ andú2 values. It should be noted that in thisrobust adaptive PLS methodú2 was also recalculated whenevera new observation became available.

The presented robust adaptive PLS method can be categorizedinto two different approaches according to the rejection thresholdfor the screening of abnormal data. In hard threshold approach,all data identified as outlier were simply eliminated and thusonly normal data were used in the model update. On the otherhand, in soft threshold approach, all incoming data includingoutliers were used in the model update. The soft threshold

Figure 2. Flow chart of continuous model updating in the robust adaptive PLS algorithm.

Table 2. Weight Functions Used in the Robust Adaptive PLS withSoft Threshold

category weighting function constant,c

Cauchy ωi ) 1/(1 + (æi/cú2)2 3.94Fair ωi ) 1/(1 + æi/cú2)2 3.6Bisquare ωi ) [1 - (æi/ú2)2/c2]2 for æi ec; 0 for æi > c 8.41

æ ) SPE

δ2+ T 2

øl2

(2)

Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007957

Page 4: Robust Adaptive Partial Least Squares Modeling of a Full-Scale Industrial Wastewater Treatment Process

approach was designed on the basis of Pell’s idea,13 which useda weight function to suppress the adverse effect of outliers inidentifying a static PLS model. In this study, however, theweight was calculated from the combined monitoring indexinstead of the cross-validated residual used by Pell, because itis a more meaningful indicator that can discriminate outliersboth in the prediction and the monitoring aspects of the PLSmodel. The soft threshold approach can be further classifiedaccording to the weight function used for the calculation of theweight value. In this study, three different weight functions listedin Table 2 were tested, which were modified from the weightfunctions used by Pell.13 The calculated weight for an observa-tion ranges from 0 to 1, depending on its combined monitoringindex and the confidence limit. Although the dependency isdifferent according to the weight function, the calculated weightconverges to zero as the combined monitoring index exceedsthe confidence limit increasingly. By multiplying the weightclose to zero, the observation corresponding to a high-leverageoutlier can be disguised as a normal observation and therobustness of the PLS model can be maintained during the model

update. All weight functions have their own tuning parameterc. We determined these values empirically to provide the bestperformance in terms of both prediction and monitoringperformances. The tuning parameter values used in this studyare also presented in Table 2.

4. Results and Discussion

(4.1) Performance of Static PLS Method.Figure 3 showsthe prediction and monitoring results obtained by applying thestatic PLS method, which reveals some evidence for thefundamental drawback of the static PLS method. In general,the prediction accuracy of the model for the validation part wasworse than that for the calibration part, as most data-drivenmodeling approaches show a similar result. A particular featureis that the prediction accuracy of the model remarkablydecreased after 2300 h, especially for the effluent TOD. Thisfailure in the prediction indicates that after 2300 h the processstates might be significantly changed from the normal statesconsidered in the model calibration step. Because the static PLS

Figure 3. Prediction results and monitoring charts obtained by using the static PLS method. Gray circles: measured values. Solid line: predicted values.Short dashed line: 95% confidence limit. Long dashed line: 99% confidence limit.

958 Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007

Page 5: Robust Adaptive Partial Least Squares Modeling of a Full-Scale Industrial Wastewater Treatment Process

model is usually derived from a limited historical database, itcannot well-describe the correlations between process variablesthat are not reflected on the calibration data sets. The occurrenceof these severe process changes can be also recognized in themonitoring charts. As can be seen in the SPE and theT 2 plots,the SPEs continuously violated the confidence limit after 2300h, while the T 2s showed a relatively stable profile. Thecontinuous violation of the SPE confidence limit implies thatthe process changes could be characterized as not simple outliersthat originated from temporary sensor faults or malfunction ofinstruments but new sources of correlations that should befurther reflected on the model.

Figure 4 shows the actual profiles of the process variableswhich were identified to be closely related to the processchanges by using SPE contribution plot. After 2300 h the addedamount of sodium hydroxide was abruptly decreased, whereasthe added amount of extremely high strength wastewater wasincreased. It should be noted that both of them are the majormanipulation variables to control the performance of theanaerobic filter process. Indeed, these control actions werestrategically adopted by field operators because the anaerobicfilter process had suffered from a media plugging problem thatseemed to be instigated by the continuous feeding of suspendedsolids, mainly consisting of undissolved terephthalic acid. Inorder to resolve this media plugging problem, it was intendedthat the suspended solids were removed through preliminarysedimentation. However, this resulted in the lowered acidity andTOD concentration of the feed stream so that the control actionsto compensate for them were inevitable.

The adoption of a new control strategy and its consequentialchanges of process states are often experienced in most industrialprocesses. Because the static PLS method does not have anyscheme to detect these types of process changes and to reflectthem in the model, it seems that the static PLS method is notappropriate for the prediction and monitoring of an industrialprocess in a long-term perspective.

(4.2) Performance of Adaptive PLS Method.Figure 5represents the prediction and monitoring results obtained byapplying the adaptive PLS method. Compared with the staticPLS method, the adaptive PLS method showed relatively

enhanced prediction ability in spite of the introduction of theintentional process changes explained previously. The adaptivePLS method had the scheme to update the model periodicallyso that the intentional process changes could be effectivelyreflected on the model along with the time proceeding. Theadaptation of the model to the process changes can be identifiedmore obviously in the monitoring charts, which show theadaptive confidence limits ofT 2 and SPE. In general, bothconfidence limits were updated well adaptively, providing morereasonable statistical guidelines to discriminate the outliers.

As can be seen in the latter part of Figure 5, however, themodel performances were seriously deteriorated after around3500 h. The prediction accuracies for the effluent TOD and themethane production rate were greatly declined, showing some-what insensitive time profiles against the considerable variationsof the model input variables. Furthermore, it was also observedthat the 95 and 99% confidence limits of the SPE weredramatically increased during this period. The fault identificationmethod based onT 2 and SPE contribution plots revealed thatthese sudden deteriorations of the model performances wereclosely related to the sensor fault of the pH meter whichmeasured the effluent pH of the anaerobic filter process. Duringthe whole period, the actual pH values of the effluent werenormally maintained around 7.0 with a very small variance, butthe online measured pH values from 3486 to 3489 h were 14.0due to the sensor fault. When these contaminated data sets(outliers) with an abnormally increased variance of the effluentpH were used in the model update procedure, the regressioncoefficients for other model input variables were remarkablyshrunk from their normal values and thus the updated modelbecame insensitive to the variations of the model input variables.The incursion of these severe outliers also distorted thedistribution of the SPEs, resulting in the increase of theconfidence limits as described previously.

The conventional adaptive PLS method seems to be moreappropriate than the static PLS method for the prediction andmonitoring of an industrial process in online manner because itcan adaptively capture the process changes by way of updatingthe model periodically. As can be deduced from the aboveresults, however, it still has a limitation that the model can lose

Figure 4. Time profiles of the various process variables showing abrupt changes after 2300 h.

Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007959

Page 6: Robust Adaptive Partial Least Squares Modeling of a Full-Scale Industrial Wastewater Treatment Process

its robustness when a number of severe outliers are used in themodel update procedure.

(4.3) Performance of Robust Adaptive PLS Method.(4.3.1) Hard Threshold Approach. The hard threshold ap-proach may be one of the most intuitive methods to suppressthe adverse effect of outliers during the model update. Becauseonly the normal operation data passing the hard threshold wereused for the model update, it was expected that the robustnessof the model could be always guaranteed. Figure 6 representsthe prediction and monitoring results obtained by applying therobust adaptive PLS method with the hard threshold approach.Overall, the prediction accuracy of the model was greatlyenhanced compared with the results of the static and the adaptivePLS methods. The adaptation of the model also proceededproperly up to 2300 h. Unexpectedly, however, the model wasnever updated after 2300 h, showing rather problematic processmonitoring ability. This interruption of the model update seemsto be closely related to the introduction of the intentional processchanges explained in the results of the static PLS method.Because the hard threshold approach has the nature of strict

elimination of outliers, the rapid and persistent process changescan be hardly reflected on the model.

It is very interesting that the prediction accuracy of the modelwas maintained satisfactorily throughout all of the time spandespite the model never being updated after 2300 h. Theprediction accuracy after 3500 h was even higher than that ofthe adaptive PLS model. These results imply that the operationdata after 2300 h have no essential information to enhance theprediction accuracy of the model and the operation data around3500 h would be even harmful if they were used for the modelupdate. Because the model had been already updated many timesuntil 2300 h capturing sufficient information to describe thewhole process dynamics, the prediction accuracy of the modelcould be maintained satisfactorily thereafter. In general, how-ever, it should be remembered that adequate predictions arepossible only when the model is updated properly. Moreover,it should be also noted that in the given example the operationdata after 2300 h still contain some critical information toenhance the monitoring ability of the model so that they shouldbe further reflected in the model if possible.

Figure 5. Prediction results and monitoring charts obtained by using the adaptive PLS method. Gray circles: measured values. Solid line: predicted values.Short dashed line: 95% confidence limit. Long dashed line: 99% confidence limit.

960 Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007

Page 7: Robust Adaptive Partial Least Squares Modeling of a Full-Scale Industrial Wastewater Treatment Process

(4.3.2) Soft Threshold Approach. In the soft thresholdapproach, three different weight functions (i.e., Cauchy, Fair,and Bisquare) were tested to suppress the adverse effect of theoutliers during the model update. The resultant performance ofeach weight function is summarized in Table 3 together withthe performances of the other PLS modeling approachesconsidered in this study. Overall, the soft threshold approachalways showed better prediction accuracy than the othersregardless of the kind of used weight function. The monitoringability of the soft threshold approach also seemed to be

Figure 6. Prediction results and monitoring charts obtained by using the robust adaptive PLS with hard threshold method. Gray circles: measured values.Solid line: predicted values. Short dashed line: 95% confidence limit. Long dashed line: 99% confidence limit.

Table 3. Performances of the Various PLS Modeling Approaches

category RMSEa adaptabilityb

static PLS 1.7818 noadaptive PLS 1.6299 yesrobust adaptive PLS hard threshold 1.2099 no

soft threshold Cauchy 1.1828 yesFair 1.1812 yesBisquare 1.1833 yes

a Root-mean-square error.b “Yes” means that PLS model could trackthe process changes properly.

Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007961

Page 8: Robust Adaptive Partial Least Squares Modeling of a Full-Scale Industrial Wastewater Treatment Process

outstanding for all cases compared with the others. Among thetested weight functions, the Fair weight function gave the bestperformance, so its detailed results are illustrated in Figure 7.As can be seen in this figure, the robust adaptive PLS methodwith the Fair weight function showed fairly good performancesboth in the prediction and in the monitoring results. In particular,the monitoring ability of the model was greatly improvedcompared with the hard threshold approach. The fault detectionability of the model was never interrupted by any type of processchanges, and most alarms violating the confidence limits ofT 2,SPE, and the combined monitoring index were clearly related

with the abnormal process operations such as process shut-downor back-flushing of the reactor.

Although the detailed results are presented here only for theFair weight function, all other soft threshold methods usingdifferent weight functions also exhibited comparable results. Theweight functions considered in this study have a commoncharacteristic in that they are all continuous functions to generatea weight value from 0 to 1, depending on the combinedmonitoring index and its confidence limit. However, theygenerate different weight values for the same observation,resulting in different screening abilities of outliers. Figure 8

Figure 7. Prediction results and monitoring charts obtained by using the robust adaptive PLS method with Fair weighting function. Gray circles: measuredvalues. Solid line: predicted values. Short dashed line: 95% confidence limit. Long dashed line: 99% confidence limit.

962 Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007

Page 9: Robust Adaptive Partial Least Squares Modeling of a Full-Scale Industrial Wastewater Treatment Process

represents the weight value profiles which were obtained byapplying different weight functions. All weight functionsconsidered here could generate proper weight values not onlyfor the intentional process changes after 2300 h but also for thesensor faults around 3500 h. For the observations correspondingto the intentional process changes, the weight functions gener-ated moderately lowered weight values first, which eventuallyincreased again as the model was adapted to the new processconditions. On the other hand, the weight functions generatedweight values close to zero for the observations correspondingto the sensor faults.

It should be noted that each weight function had differentstrictness in screening the outliers. The order of strictnessseemed to be Bisquare, Cauchy, and Fair in descending order.However, it is miscellaneous how this strictness is related withthe performance of the weight function. Moreover, each weightfunction has its own tuning parameter that was determinedheuristically to provide the best performance in terms of bothprediction and monitoring performances. Indeed, the weightfunctions showed different behaviors with different tuningparameters, and this made the selection of optimum weightfunction somewhat problematic. Although the Fair weightfunction gave the best performance in this study, the perfor-mances of the weight functions could be completely differentaccording to the characteristics of the process dynamics of amodel process.

5. Conclusions

In this paper, a new scheme of robust adaptive PLS methodwas proposed to overcome the limitations of the conventionalPLS methods. A full-scale anaerobic filter process that showedhighly complicated process dynamics was selected as a modelprocess, and the feasibilities of different PLS methods wereinvestigated, especially focusing on their performances in theprediction and monitoring of the process. The conventional staticPLS method showed very limited performances for the wholevalidation data sets because of the highly time-varying processdynamics of the model process. Although the conventionaladaptive PLS method could reflect this time-varying processdynamics on the model, it also showed unsatisfactory perfor-mances after some severe outliers were used in the modelupdate. On the other hand, most robust adaptive PLS methodsbased on the proposed scheme showed satisfactory predictionand monitoring performances, reasonably eliminating the ad-verse effect of the outliers during the model update. We believethat the presented robust adaptive PLS modeling approach could

be successively applied for the explorations of the variousindustrial processes that have complex and time-varying processdynamics.

Acknowledgment

This work was financially supported by the SamsungPetrochemical Co. Ltd. and by the ERC program of MOST/KOSEF (Grant R11-2003-006-01001-1) through the AdvancedEnvironmental Biotechnology Research Center at POSTECH.This work was also supported by the program for advancededucation of chemical engineers (second stage of BK21).

Literature Cited

(1) MacGregor, J. F.; Kourti, T. Statistical process control of multivariateprocesses.Control Eng. Practice1995, 3, 403.

(2) Wise, B. M.; Gallagher, N. B. The process chemometrics approachto process monitoring and fault detection.J. Process Control1996, 6, 329.

(3) Teppola, P.; Mujunen, S. P.; Minkkinen, P. Partial, least squaresmodeling of an activated sludge plant: A case study.Chemom. Intell. Lab.Syst.1997, 38, 197.

(4) Qin, S. J. Recursive PLS algorithms for adaptive data modeling.Comput. Chem. Eng.1998, 22, 503.

(5) Rosen, C.; Lennox, J. A. Multivariate and multiscale monitoring ofwastewater treatment operation.Water Res.2001, 35, 3402.

(6) Lee, D. S.; Vanrolleghem, P. A. Monitoring of a sequencing batchreactor using adaptive multiblock principal component analysis.Biotechnol.Bioeng. 2002, 82, 489.

(7) Wang, X.; Kruger, U.; Lennox B. Recursive partial least squaresalgorithms for monitoring complex industrial processes.Control Eng.Practice2003, 11, 613.

(8) Li, W.; Yue, H. H.; Valle-Cervantes, S.; Qin, S. J. Recursive PCAfor adaptive process monitoring.J. Process Control2000, 10, 471.

(9) Rousseeuw, P. J. Least median of squares regression.JASA, J. Am.Stat. Assoc.1984, 79, 781.

(10) Devlin, S. J.; Gnanadesikan, R.; Kettenting, J. R. Robust estimationof dispersion matrices and principal components.JASA, J. Am. Stat. Assoc.1981, 76, 354.

(11) Rousseeuw, P. J.; van Zomeren, B. C. Unmasking multivariateoutliers and leverage points.JASA, J. Am. Stat. Assoc.1990, 85, 633.

(12) Wakeling, I. N.; Macfie, J. H. H. A robust PLS procedure.J.Chemom. 1992, 6, 189.

(13) Cummins, D. J.; Andrew, C. W. Iteratively reweighted partial leastsquares: A performance analysis by Montecarlo simulation.J. Chemom.1995, 9, 489.

(14) Gil, J. A.; Romera, R. On robust partial least squares (PLS) methods.J. Chemom.1998, 12, 365.

(15) Pell, R. J. Multiple outlier detection for multivariate calibrationusing robust statistical techniques.Chemom. Intell. Lab. Syst.2000, 52,87.

(16) Geladi, P.; Kowalski, B. R. Partial least-squares regression: Atutorial. Anal. Chim. Acta1986, 185, 1.

(17) Lee, M. W.; Joung, J. Y.; Lee, D. S.; Park, J. M.; Woo, S. H.Application of a moving-window neural network to the modeling of a full-scale anaerobic filter process.Ind. Eng. Chem. Res.2005, 44, 3973.

Figure 8. Sample weights calculated from soft threshold method: (a) Cauchy, (b) Fair, and (c) Bisquare weight functions.

Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007963

Page 10: Robust Adaptive Partial Least Squares Modeling of a Full-Scale Industrial Wastewater Treatment Process

(18) Box, G. E. P.; Jenkins, G. M.; Reinsel, G. C.Time series analysis:Forecasting and control; Prentice Hall: Englewood Cliffs, NJ, 1994.

(19) Dayal, B. S.; MacGregor, J. F. Recursive exponentially weightedPLS and its applications to adaptive control and prediction.J. ProcessControl 1997, 7, 169.

(20) Shi, R.; MacGregor, J. F. Modeling of dynamic systems using latentvariable and subspace methods.J. Chemom.2000, 14, 423.

(21) Ku, W.; Storer, R. H.; Georgakis, C. Disturbance detection andisolation by dynamic principal component analysis.Chemom. Intell. Lab.Syst.1995, 30, 179.

(22) Akaike, H. A new look at the statistical model identification.IEEETrans. Autom. Control1974, 19, 716.

(23) Wu, T. J.; Sepulveda, A. The weighted average information criterionfor order selection in time series and regression models.Stat. Probab. Lett.1998, 39, 1.

(24) Wise, B. M.; Gallagher, N. B.PLS toolbox, version 2.1; EigenvectorResearch, Inc.: Wenatchee, WA, 2000.

(25) Wold, S. Cross-validatory estimation of the number of componentsin factor and principal component analysis.Technometrics1978, 20, 397.

(26) Li, B.; Morris, J.; Martin, E. B. Model selection for partial leastsquares regression.Chemom. Intell. Lab. Syst.2002, 64, 79.

(27) Jackson, J. E.A user’s guide to principal components; Wiley-Interscience: New York, 1991.

(28) Vijaysai, P.; Gudi, R. D.; Lakshminarayanan, S. Identification ondemand using blockwise recursive partial least-squares technique.Ind. Eng.Chem. Res. 2003, 42, 540.

(29) Yue, H. H.; Qin, S. J. Reconstruction-based fault identification usinga combined index.Ind. Eng. Chem. Res. 2001, 40, 4403.

ReceiVed for reView August 18, 2006ReVised manuscript receiVed October 31, 2006

AcceptedNovember 13, 2006

IE061094+

964 Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007