19
Online Identification of Takagi-Sugeno Fuzzy Models Based on Self-Adaptive Hierarchical Particle Swarm Optimization Algorithm Saeid Rastegar a,, Rui Araújo a , Jérôme Mendes a a Institute of Systems and Robotics (ISR-UC), and Department of Electrical and Computer Engineering (DEEC-UC), University of Coimbra, Pólo II, PT-3030-290 Coimbra, Portugal Abstract This paper presents an approach for online learning of Takagi-Sugeno (T-S) fuzzy models. A novel learning algorithm based on a Hierarchical Particle Swarm Optimization (HPSO) is introduced to automatically extract all fuzzy logic system (FLS)’s parameters of a T-S fuzzy model. During online operation, both the consequent parameters of the T-S fuzzy model and the PSO inertia weight are continually updated when new data becomes available. By applying this concept to the learning algorithm, a new type T-S fuzzy modelling approach is constructed where the proposed HPSO algorithm includes an adaptive procedure and becomes a self-adaptive HPSO (S-AHPSO) algorithm usable in real-time processes. To improve the computational time of the proposed HPSO, particles positions are initialized by using an ecient unsupervised fuzzy clustering algorithm (UFCA). The UFCA combines the K-nearest neighbour and fuzzy C-means methods into a fuzzy modelling method for partitioning of the input-output data and identifying the antecedent parameters of the fuzzy system, enhancing the HPSO’s tuning. The approach is applied to identify the dynamical behaviour of the dissolved oxygen concentration in an activated sludge reactor within a wastewater treatment plant. The results show that the proposed approach can identify nonlinear systems satisfactorily, and reveal superior performance of the proposed methods when compared with other state of the art methods. Moreover, the methodologies proposed in this paper can be involved in wider applications in a number of fields such as model predictive control, direct controller design, unsupervised clustering, motion detection, and robotics. Keywords: Online learning of Takagi-Sugeno model, Self-adaptive Hierarchical PSO Algorithm, Activated sludge process 1. Introduction System identification based on experimental data has been considered as a powerful practical engineering tool in many industrial control fields. A common point among most control methodologies is their assumption of the knowledge of an accurate model of the process to be controlled. Mostly a good control performance can be obtained if a good system modeling of a plant can be estimated. This assumption may cause important problems. It happens a lot that in a system identification process due to complexity in plant behavior or large uncertainties, obtaining an explicit mathematical model based on physical laws for the plant meets with diculties. Dierent identification approaches to modeling nonlinear plants were employed to be used in control design. Among them, fuzzy models have received particular attention in the area of nonlinear modelling (Škrjanc, 2009), especially the Takagi-Sugeno (T-S) fuzzy models (Takagi and Sugeno, 1985). The main feature of a T-S fuzzy model is that it may express the local dynamics of each fuzzy implication (rule) by a linear system model (Barros and Dexter, 2007; Chang et al., 2010; Hua et al., 2013; Dovzan et al., 2015). The T-S fuzzy model parameters can be estimated in either or both oine or online modes. However, an online learning mode can have superiority because in most cases, the collected data set used in oine methods is limited, and the estimated T-S fuzzy model may not provide adequate accuracy in parts of, Corresponding author Email addresses: [email protected] (Saeid Rastegar ), [email protected] (Rui Araújo), [email protected] (Jérôme Mendes) 1

Online Identification of Takagi-Sugeno Fuzzy Models …home.deec.uc.pt/~rui/publications/amm2017_v45May_606_web.pdf · Online Identification of Takagi-Sugeno Fuzzy Models Based

  • Upload
    lyanh

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Online Identification of Takagi-Sugeno Fuzzy Models Based on Self-AdaptiveHierarchical Particle Swarm Optimization Algorithm

Saeid Rastegara,∗, Rui Araújoa, Jérôme Mendesa

aInstitute of Systems and Robotics (ISR-UC), and

Department of Electrical and Computer Engineering (DEEC-UC),

University of Coimbra, Pólo II, PT-3030-290 Coimbra, Portugal

Abstract

This paper presents an approach for online learning of Takagi-Sugeno (T-S) fuzzy models. A novel learning algorithmbased on a Hierarchical Particle Swarm Optimization (HPSO) is introduced to automatically extract all fuzzy logicsystem (FLS)’s parameters of a T-S fuzzy model. During online operation, both the consequent parameters of theT-S fuzzy model and the PSO inertia weight are continually updated when new data becomes available. By applyingthis concept to the learning algorithm, a new type T-S fuzzy modelling approach is constructed where the proposedHPSO algorithm includes an adaptive procedure and becomes a self-adaptive HPSO (S-AHPSO) algorithm usable inreal-time processes. To improve the computational time of the proposed HPSO, particles positions are initialized byusing an efficient unsupervised fuzzy clustering algorithm (UFCA). The UFCA combines the K-nearest neighbourand fuzzy C-means methods into a fuzzy modelling method for partitioning of the input-output data and identifyingthe antecedent parameters of the fuzzy system, enhancing the HPSO’s tuning. The approach is applied to identifythe dynamical behaviour of the dissolved oxygen concentration in an activated sludge reactor within a wastewatertreatment plant. The results show that the proposed approach can identify nonlinear systems satisfactorily, and revealsuperior performance of the proposed methods when compared with other state of the art methods. Moreover, themethodologies proposed in this paper can be involved in wider applications in a number of fields such as modelpredictive control, direct controller design, unsupervised clustering, motion detection, and robotics.

Keywords: Online learning of Takagi-Sugeno model, Self-adaptive Hierarchical PSO Algorithm, Activated sludgeprocess

1. Introduction

System identification based on experimental data has been considered as a powerful practical engineering toolin many industrial control fields. A common point among most control methodologies is their assumption of theknowledge of an accurate model of the process to be controlled. Mostly a good control performance can be obtainedif a good system modeling of a plant can be estimated. This assumption may cause important problems. It happensa lot that in a system identification process due to complexity in plant behavior or large uncertainties, obtaining anexplicit mathematical model based on physical laws for the plant meets with difficulties. Different identificationapproaches to modeling nonlinear plants were employed to be used in control design. Among them, fuzzy modelshave received particular attention in the area of nonlinear modelling (Škrjanc, 2009), especially the Takagi-Sugeno(T-S) fuzzy models (Takagi and Sugeno, 1985). The main feature of a T-S fuzzy model is that it may express the localdynamics of each fuzzy implication (rule) by a linear system model (Barros and Dexter, 2007; Chang et al., 2010;Hua et al., 2013; Dovzan et al., 2015). The T-S fuzzy model parameters can be estimated in either or both offline oronline modes. However, an online learning mode can have superiority because in most cases, the collected data setused in offline methods is limited, and the estimated T-S fuzzy model may not provide adequate accuracy in parts of,

∗Corresponding authorEmail addresses: [email protected] (Saeid Rastegar ), [email protected] (Rui Araújo), [email protected] (Jérôme Mendes)

1

or the whole, operating areas of the plant (Salahshoor et al., 2012; Li and Du, 2012). Many optimization methodswere investigated to be used in T-S fuzzy modelling. Among them, evolutionary algorithms such as particle swarmoptimization (PSO) algorithms and genetic algorithms (GAs) have shown a good adaptation to search for optimal T-Sfuzzy model parameters (Delgado et al., 2001; dos Santos Coelho and Herrera, 2007; Shuzhi et al., 2012). Although aGA may be able to find the global minimum, it consumes too much search time. The PSO algorithm is an alternative.It can result in a lower computational complexity with a better performance of fitness evaluation in general whencompared with GAs (Eberhart and Shi, 1998), consequently resulting in a better prediction performance when appliedto learn a T-S fuzzy model. Ali and Ramaswamy (2009) presented an optimal fuzzy logic control (FLC) algorithm forvibration mitigation of buildings using magneto-rheological (MR) dampers. A micro-genetic algorithm (m-GA) anda standard particle swarm optimization (SPSO) were used to optimize the FLC parameters. In the method, only themembership function parameters are optimized, while the other components of the fuzzy system are considered to befixed. Other common limitation of this method, and other identification methods such as (Rastegar et al., 2014, 2016),concerns to the selection of the correct set of input variables. The variable selection process is usually manual and notaccompanied with the accurate selection of the right time delays, probably leading to low-accuracy results. A variablewith the correct delay may contain more information about the output, than one which does not consider any delay orwhich considers an incorrect delay (Souza et al., 2010). The method presented in (Rastegar et al., 2014, 2016) uses aPSO algorithm to construct a self adaptive T-S model identification methodology, but the method lacks an automaticinput selection system. An adaptive PID controller based on adaptive PSO was proposed in (Alfi and Modares, 2011).In most of the cases, controlling the inertia weight of PSO has been done based on a constant parameter, or on alinearly decreasing inertia weight PSO (LDW-PSO) has also been used (Shi and Eberhart, 1998). But in (Alfi andModares, 2011), the inertia weight was dynamically adapted for every particle by considering a measure called theadjacency index (AI). This idea lead the classical LDW-PSO algorithm to be evolved into a new class of systemmodelling based on an adaptive PSO (APSO) methodology. The results proved a superiority of APSO performancewhen compared with the LDW-PSO or GA.

The prediction performance of a Takagi-Sugeno fuzzy model depends on its complexity (e.g. number of fuzzyrules), on the type of membership functions, on the antecedent variables, and on the consequent regressors (Yao et al.,2014). A hierarchical genetic algorithm (HGA) was suggested by Delgado et al. (2001) to find optimal parametersof T-S fuzzy systems through an evolutionary genetic algorithm and a neuro-based technique, and then was improved(Delgado et al., 2009) by proposing an idea for pre-selection of input variables using an auxiliary criterion. However,the variable and delay selection are not jointly performed with the learning of the fuzzy model, which precludes theglobal optimization of the prediction setting. As an evolution of the work (Delgado et al., 2001, 2009), Mendeset al. (2012) proposed an identification methodology based on the application of a HGA, and also the proposedidentification method was applied for the design of a direct control methodology (Mendes et al., 2014b). The mainadvances in this work are the improvement of the whole hierarchical structure, automatically extracting the fuzzycontrol rules. Despite of having a successful performance, the HGA did not utilize any significant improvement inits interior structure to be augmented for online applications. A related methodology for online applications wasproposed in (Mendes et al., 2013), using a combination with other method, the recursive least squares (RLS) methodwith adaptive directional forgetting approach (Kulhavý, 1987). For automatic extraction of all FLS parameters, thenumber of partitions is a key parameter. Approximately all the FLS units of a T-S fuzzy model such as the set offuzzy rules, the individual rules, the consequent parameters, as well as the unit of the FLS, use this parameter duringthe FLS construction. However, in both the methods proposed in (Delgado et al., 2001) and in (Mendes et al., 2012),the number of partitions was defined firstly and then considered as a constant value during all the running time of thealgorithm.

Motivated by the issues discussed above, this work first introduces a novel identification methodology which, dif-ferently from (Rastegar et al., 2014, 2016), is based on a hierarchical PSO (HPSO) to learn the FLS’s parameters of aT-S fuzzy model from a set of input/output data. The use of a hierarchical structure has the advantage of decreasingthe complexity of the learning method, also leading to the improvement of computational time and prediction perfor-mance. The search engine design of the HPSO was inspired by the method proposed in (Mendes et al., 2012), buthas the following three main advantages in comparison to (Mendes et al., 2012): less running time, larger predictionaccuracy, and possibility to be used in online model learning. Unlike the works in (Rastegar et al., 2014, 2016) whichuse a simple sequential structure, and where no selection of input variables and delays is performed, the algorithmproposed in this paper is composed of a six level hierarchy, where the first level is responsible for the selection of

2

an adequate set of input variables and delays. The second level considers the antecedent membership functions. Theconsequent parameters are defined on the third Level. Level four is devoted to particles of individual rules. The set ofrules are obtained on the fifth level, and finally, the sixth level is appointed as the control unit of FLS, specifying thechoice of several configurations in the lower levels. For improving the convergence time, an initialization algorithmbased on an efficient unsupervised fuzzy clustering algorithm (Rastegar et al., 2016) is employed to initialize parti-cles’ positions inside the HPSO, as well as the number of partitions. Finally, an online adaptive approach based on theHPSO is proposed which results in a self-adaptive HPSO (S-AHPSO) approach to make an online learning/updatingof the fuzzy T-S model.

To validate the performance and effectiveness of the proposed algorithm, prediction of the dynamic behavior ofthe dissolved oxygen concentration in an activated sludge reactor within a wastewater treatment plant (WWTP) isinvestigated. WWTPs are large and complex non-linear systems subject to large disturbances in influent flow rate andpollutant load, together with uncertainties concerning the composition of the incoming wastewater (Belchior et al.,2012). In (Belchior et al., 2012), a basic fuzzy controller and an adaptive fuzzy controller (AFC) were developedand applied to control the dissolved oxygen in an activated sludge reactor within a WWTP. Using the BFC controller,a learning data set was obtained for the proposed HPSO algorithm to learn the T-S fuzzy model of, and predict,the dissolved oxygen concentration in an activated sludge reactor. The extracted parameters of the T-S fuzzy modelwhich are continually updated by the S-AHPSO can be used to design an adaptive and/or predictive fuzzy controllersuch as the AFC in (Belchior et al., 2012) or the adaptive fuzzy generalized predictive controller (AFGPC) whichrecently has shown a successful performance in non-linear industrial applications (Mendes et al., 2013). The proposedmethod is quantitatively compared with two recursive approaches: a recursive partial least squares (RPLS) (Dayal andMacGregor, 1997), and an RLS with a directional forgetting approach (Kulhavý, 1987) which uses fuzzy C-means(FCM) (Windham, 1982) for initialization. Furthermore, the results are compared with the results of four othermethods: using a multi-layer perceptron neural network (MLPNN) for nonlinear system identification (Nelles, 2001),a new fuzzy c-regression model algorithm (NFCRMA) proposed by Li et al. (2009), an extreme learning machine forregression (ELM) (Huang et al., 2012), and a genetic fuzzy system for data-driven soft-sensor design (Mendes et al.,2012). Moreover, the performance of an identification method based on adaptive PSO (APSO) (Alfi and Modares,2011) is also evaluated and compared. The system identification methodology proposed in this work can deal withnon-linear plants, time-varying processes, varying operating regions and parameters of the model.

The paper is organized as follows. Section 2 presents a description about the nonlinear system modelling method-ology based on the T-S fuzzy model, and on the fuzzy C-means clustering algorithm. Section 3 briefly presentsthe PSO algorithm. Section 4 presents the proposed HPSO algorithm which will be accompanied initially by theintroduction of the UFCA as an initialization algorithm to enhance the HPSO’s performance, and subsequently bythe presentation of the S-AHPSO algorithm which proposes an online learning/updating of the T-S fuzzy model. InSection 5, results in identification of a plant are presented and analysed. Finally, Section 6 makes concluding remarks.

2. T-S Fuzzy Models Based on Fuzzy C-Means Clustering

In this section it is briefly presented a methodology for modeling nonlinear processes. The structure is based onthe T-S fuzzy model, and the initialization of its antecedent part is performed by learning from data using the FuzzyC-Means Clustering method. The antecedent part will then be improved using the proposed methodology presentedon Section 4.

2.1. Modelling Using T-S Fuzzy Models

This section describes the T-S fuzzy model that with their simplified linear rule consequents are universal approx-imators capable of approximating any continuous nonlinear system (Ying, 1997). In this way, a nonlinear system withcontinuous constituent functions can be described a T-S fuzzy model composed by the following fuzzy rules:

Ri : IF x1(k) is Ai1, and . . . and xN(k) is Ai

N

THEN y(k) = yi(k) = θi1x1(k) + · · · + θiN xN(k), (1)

3

where Ri (i = 1, 2, . . . , c) represents the i-th fuzzy rule, c is the number of rules, and x1(k), . . . , xN(k) are the inputvariables of the T-S fuzzy model. Ai

j( j = 1, 2, . . . ,N) are linguistic terms characterized by fuzzy membership func-

tions µAij(x j) which describe the local operating regions of the plant. θi1, · · · , θiN are model parameters of yi(k). From

(1), the T-S fuzzy model y(k) can be rewritten as

y(k) =c

i=1

ωi[x(k)]xT(k)θi = Ψ(k)Θ, (2)

where for i = 1, . . . , c, and assuming Gaussian membership functions,

x(k) = [x1(k), . . . , xN(k)] , (3)

µAij(x j) = exp

(

−(x j − vi j)2

σi j

)

, j = 1, . . . ,N, (4)

ωi[x(k)] =

∏Nj=1 µAi

j(x j)

∑cp=1

∏Nj=1 µA

p

j(x j), (5)

θi = [θi1 . . . , θiN]T , (6)

Θ =[

θT1 , θ

T2 , . . . , θ

Tc

]T, (7)

Ψ(k) =[(

ω1[x(k)])

x(k), . . . , (ωc[x(k)]) x(k)]

, (8)

where vi j andσi j are the center and width of the antecedent membership functions, respectively, to be learned/designed.In an initial stage, vi j and σi j will be learned by the Fuzzy C-means method (Section 2.2), and then will be improvedusing the proposed methodology presented on Section 4.

2.2. Fuzzy C-Means

This section describes the fuzzy c-means (FCM) method. In the fuzzy clustering methods the objects can belongto a predefined number of clusters, c. Defining xl = [xl1, . . . , xlN]T ∈ RN as an observation l composed by a set of N

samples, one sample for each input variable j. A set of L observations is then defined as

X =

x11 x12 . . . x1N

x21 x22 . . . x2N

......

......

xL1 xL2 . . . xLN

. (9)

The fuzzy partition of the set X into pre-defined number of clusters c, is a family of fuzzy subsets {Ai | 1 ≤ i ≤ c},in which their membership functions are defined as µi(l) = µAi (x

l), and form the fuzzy partition matrix U = [uil] =

[µi(l)] ∈ Rc×L. Each row i of matrix U contains the values of the membership function of the i-th fuzzy subset Ai

for all the observations stored in the data matrix X. Similarly to (Dovžan and Škrjanc, 2011), the partition matrixshould satisfy the following conditions: (i) The membership degrees are real numbers in the interval µi(l) ∈ [0, 1], forl = 1, . . . , L; The total membership of each sample in all the clusters must be equal to one

∑ci=1 µi(l) = 1; (ii) And none

of the fuzzy clusters is empty, neither do any contain all the data 0 <∑L

l=1 µi(l) < L, for i = 1, . . . , c. FCM methodwill obtain the antecedents parameters vi j and σi j by minimizing the following objective function:

J(X,U,V) =c

i=1

L∑

l=1

[µi(l)]ηd2

il(xl, vi), (10)

4

where V = [v1, . . . , vc]T ∈ Rc×N is a matrix of cluster centroid vectors vi = [vi1, · · · , viN]T , dil(xl, vi) is the Euclideandistance (l2-norm) between the observation xl and the cluster centroid vi, and the overlapping factor or the fuzzinessparameter η influences the fuzziness of the resulting partition. The partition can range from a hard partition (η = 1)to a completely fuzzy partition (η → ∞). If the derivative of the objective function (10) is taken with respect tothe cluster centers V and to the membership values U, then optimum membership values are calculated as follows(Dovžan and Škrjanc, 2011):

µi(l) =

d2il

c∑

q=1

(

d2ql

)1/(η−1)

−1

, (11)

where

d2il = (xl − vi)

T (xl − vi) , (12)

and

vi =

∑Ll=1 µ

η

i(l)xl

∑Ll=1 µ

η

i(l). (13)

The centers vi j of the antecedent membership functions are obtained by (13) where vi = [vi1, · · · , viN]T , and therespective widths, σi = [σi1, · · · , σiN]T , i = 1, . . . , c, are obtained using U = [µi(l)], as follows:

σi j =

2L∑

l=1µi(l)(xl j − vi j)2

L∑

l=1µi(l)

, j = 1, . . . ,N. (14)

FCM is an unsupervised clustering algorithm. For unsupervised clustering algorithm, a cluster validity index canbe employed to evaluate the fitness of partitions produced by clustering algorithms. This work uses the Tang and Sun(2005) validity index which is based on the evaluation two criteria, compactness and separation, and is defined asfollows:

VT (U,V; X) =

∑ci=1

∑Ll=1 µ

2il‖ xl − vi ‖2

mini,k‖ vi − vk ‖2 +1/c

+

1c(c−1)

∑ci=1

∑ck=1,k,i ‖ vi − vk ‖2

mini,k‖ vi − vk ‖2 +1/c

, (15)

where c is the number of clusters, and vi is the center of cluster i, i = 1, ...c. The numerator of the second term in (15)is an ad hoc punishing function (average distance between cluster centers) which is applied to eliminate the decreasingtendency of VT (·) as c→ L.

3. Particle Swarm Optimization (PSO) Algorithm

The particle swarm optimization (PSO) concept is based on a stochastic optimization technique which was firstproposed by Kennedy and Eberhart (1995). The algorithm was inspired by the social behavior of birds or fishes. Thebasic PSO concept is trying to simulate the choreographed motion of swarms of birds as part of a socio-cognitivescience. Like genetic algorithms (GA), PSO is initialized with a, typically random, population of solutions. However,unlike GA, PSO has no evolution operators such as cross over, or mutation. Compared to GA, PSO is easier to beimplemented. With PSO, there are just a few parameters that need to be adjusted before PSO is put to run. In PSO,the potential candidate solutions are called ‘particles’ which try to fly through the problem space. Performance ofeach particle is evaluated through a fitness function which needs to be optimized. In every iteration, each particle isupdated by following two best fitness function values. The first one is the best solution which a particle has obtainedso far. In this paper this value is called “pbest”. Another value is the best solution that was obtained by any particle

5

in the swarm. In this paper this value is called “gbest”. At each iteration, t, the velocity of every particle vtr will be

iteratively calculated as follows:

vt+1r = w vt

r + c1r1(pbesttr − xt

r) + c2r2(gbestt − xtr), (16)

where xtr is the position of the particle r in iteration t, pbestt

r is the best previous position of this particle (memorizedby each individual particle), gbestt is the best previous position among all the particles in iteration t (memorized ina common repository), w is the inertia weight, and c1 and c2 are positive acceleration coefficients and are known asthe cognitive and social parameters, respectively. Finally, r1 and r2 are two random numbers in the range [0, 1]. Aftercalculating the velocity, the new position of every particle can be obtained as

xt+1r = xt

r + vt+1r . (17)

To better address the notation of the particles inside each level of the HPSO algorithm proposed in this work, andalso based on the case study in this work, equation (16) is rearranged as follows:

vpr (t + 1) = w

pr (t)vp

r (t) + c1r(p)r1(pbestpr (t) − x

pr (t)) + c2r(p)r2(gbestp(t) − x

pr (t)), (18)

where, wpr (t) and pbest

pr (t) are the inertia weight and the best personal experience of particle r in iteration t inside

Level p, respectively. gbestp(t) is the best previous experience among all the particles inside Level p in iteration t.c1r(p) and c2r(p) are acceleration coefficients of the r-th particle in Level p. Then, (17) is rearranged as:

xpr (t + 1) = x

pr (t) + v

pr (t + 1). (19)

The learning process of the HPSO algorithm utilizes the conventional linearly decreasing inertia weight PSO(LDW-PSO) concept (Shi and Eberhart, 1998), which means that for each individual particle r, and iteration t:

wpr (t) = w

pr,max − (wp

r,max − wp

r,min) ∗ t/T1, (20)

where wpr (t) is the inertia weight in iteration t in Level p. w

pr,max, w

p

r,minare the initial and the final (minimum) inertia

weight of particle r in Level p, respectively, and T1 is the number of PSO iterations.

4. Proposed T-S Fuzzy Model Identification Algorithm

This section proposes a novel learning algorithm based on a hierarchical particle swarm optimization (HPSO)algorithm to automatically extract all the parameters of the T-S fuzzy model from a set of input/output data. First,before presenting the HPSO, and in order to avoid random initialization of the particles in the HPSO and to improvecomputational time, an initialization algorithm based on an efficient unsupervised fuzzy clustering algorithm (UFCA)is employed to initialize the particles positions of the HPSO. Finally, after presenting the central structure of theHPSO, a self-adaptive algorithm based on the HPSO will be presented to perform an online learning/updating of theT-S fuzzy model from data.

4.1. Initialization Algorithm

In PSO algorithms, random initialization of the particles can result in an exhausting optimality search. In thiscondition more iterations are required to attain convergence. Finding a solution to reduce the computational time,which at the same time can increase the algorithm’s performance is desired. To achieve these desired goals, anefficient unsupervised fuzzy clustering algorithm (UFCA) (Rastegar et al., 2016) is applied to initialize the particlesof the proposed HPSO algorithm. The complete UFCA algorithm is presented in Algorithm 1. The FCM is a well-known fuzzy clustering method which allows one object in a group to belong to two or more clusters. The FCMuses some pre-defined initial values such as the number of clusters, initial cluster centers and fuzziness weightingexponent η. By minimization of the objective function J in (10), converge of FCM performance is attained. The FCMhas been frequently used in pattern recognition. However, in many conditions it was seen that random initializationof the FCM causes the algorithm converge to a local optimal solution. For example, the number of clusters for

6

Algorithm 1 Unsupervised fuzzy clustering algorithm (UFCA).

1. Construct the matrix X = [xl j]L×N, 1 ≤ l ≤ L and 1 ≤ j ≤ N, in (9) using L observations;

2. Choose the degree of fuzziness η > 1; And let g0 be the center of data X, and v∗t ← 0;3. Repeat the procedure below for c = 1, 2, . . . cmax =

√L:

(a) Initialization for iteration c:i. Let K =

Lc− 1

, and I = {1, 2, . . . , L}, where ⌊·⌋ is the floor function;(b) For i = 1, . . . , c construct Ei using K nearest neighbourhood:

i. In I find the index i of the unknown sample xi which is farthest from gi−1;ii. Ei = {xi} ∪KNN(K − 1, xi), where KNN(K − 1, xi) is the set of K − 1 nearest-neighbour samples of xi that do not

belong to any other already existing Ei.

iii. Let gi =

xk∈Eixk

K, and Ei ← Ei ∪ {gi};

iv. I ← I \ {i} \ IKNN(K − 1, xi), where IKNN(K − 1, xi) is the set of all indices n such that xn ∈ KNN(K − 1, xi);(c) While I , ∅, do:

i. Select r ∈ I, let I ← I\{r}, and calculate the distances from the still unclustered sample xr to the center gi of allEi by d(xr, gi),∀i = 1 . . . c;

ii. Assign xr to the Ei with the nearest gi, so that Ei ← Ei ∪ {xr};iii. Perform the update of gi =

xk∈Eixk

K+1 ;(d) Perform one iteration of FCM:

i. Calculate the fuzzy clustering matrix U = [µil]c×L using (11)-(12) with vi = gi in (12);ii. Calculate the clustering validity index by (15) and assign it to vt.

(e) If vt > v∗t , theni. Let the optimal number of clusters be c∗ ← c;

ii. Let E∗i ← Ei, for i = 1, . . . c∗, be the optimal clustering sets;iii. Update the optimal clustering validity index: v∗t ← vt.

4. Using U = [µil]c∗×L calculate vi and σi j by (13)-(14).

the FCM initialization is considered as an important parameter which plays an important role in convergence ofthe FCM performance. Finding a solution to find an optimal number of clusters by using real input/output datasetinstead of using a random initialization, can be an efficient strategy. To overcome random initialization concern forclusters number, this paper employs the UFCA which has been recently proposed in (Rastegar et al., 2016) as a hybridclustering algorithm based on two layers. UFCA iteratively tests several values for the number of clusters c in therange of c = 1, 2, . . . cmax =

√L, in order to find an optimal number of clusters which is denoted as c∗ in this paper.

The first layer of UFCA uses a technique based on the K-Nearest Neighbor (KNN) approach (Cover and Hart, 1967)(Step 3b) to obtain the initial centers of the clusters, for each c. The K value in the K-Nearest Neighbor (KNN) isgiven by K =

Lc− 1

, where L is total number of samples of a data set. The basic purpose in using of the KNNmethod is to try to identify, from L samples of a data set, the K samples which have the largest levels of homogeneityto a specified feature vector. Specifically, in the first layer of UFCA, data set X of (9) is divided into c clusters, inwhich members of each cluster have homogeneity in the Euclidean distance sense, and will belong to one set Ei (Steps3a-3c). Ei is an auxiliary set of samples to gather the members of tentative cluster i. Once all Ei sets are constructedfor a certain c, the algorithm arrives to the second level of UFCA where (only) one iteration of FCM is performed(Step 3d). The final step of UFCA consists on determining the best c, and the corresponding collection of the bestEi (i = 1, . . . , c), which are termed as c∗, and E∗

i(i = 1, . . . , c∗), respectively (Step 3e). In the process of finding the

best c, after the calculation of the FCM fuzzy clustering matrix U = [µil], and clusters centers vi, (i = 1, . . . , c), theclustering validity index (15) is employed to evaluate the quality of the clustering results for each c.

The results extracted by Algorithm 1 will be used to initialize the first individual of Level 1 of the HPSO algorithm(Algorithm 2, Sec. 4.2) and all the individuals of Levels 2, 4, and 5 of HPSO. Furthermore, the initial solution is alsoused to initialize alleles of the first individual in Level 6 of HPSO. The other individuals of Levels 1 and 6, and allindividuals of Level 3 are randomly initialized.

7

1 1 1 0

Level 1

10 0 1

01 1 0

set of input variables

Level 2

Individual at Partition Set

Level 4

3 5 8

Level 5

41 6

52 6

6 10 7

Level 6

310 7

11 1

unit of fuzzy logic system (FLS) 9 10 13

10 14 16

......

...

0

...

...

...

...

...

......1210

set of fuzzy rules

0.2 3.7 1.40.6 8.2

...

14.4 2.4 4.2

...

1.1 4.813.7

1.3 2.43.7 24.5 3.2... 12.4 8.2 1.31.2 3.610.7

0.23.9 11.2 3.1 314.3 7.7 1.8 4.2 4.2... 0.3 4.010.7

a

a b

a b

b

z

s

s

s

10

1

4

1...

... individual fuzzy rule

d

d

0

7

6

1

5 1

8

310

th 8

...

...

0.8 1.8 -0.5

0.4 -1.2 0.8

0.7 -0.6 0

0 -0.3 2.3

0.3 -1.2 0.7

......

...

-1.1 0.2 0.9

...

...

Level 3

Index z=1

Index z=8

10

1

1.7

set of consequent parameters

5.2

x1 x2 x3 x4

y(t-1) y(t-2) u(t-1) u(t-2)

7.7

σ1(x1) v1(x1) σ2(x1) v2(x1) σc(x1) vc(x1)

...

σ1(x2) v1(x2) σ2(x2) v2(x2) σc(x2) vc(x2)

...

σ1(x4)

4.8

3.6

4.0

v1(x4)

...

...

d 3

2

NUFCA

Algorithm 1

...

....

....

....

....

...

...

Alleles with fuzzy membership functions from a specific input variable

Separates the fuzzy membership values of two different input variables

1 linguistic termth

Figure 1: Encoding of hierarchical relations among the individuals of different levels of the HPSO algorithm in presence of UFCA.

4.2. Hierarchical Particle Swarm Optimization (HPSO) Algorithm

This section presents HPSO, an automatic evolutionary algorithm to extract from data all the FLS parameters ofthe T-S fuzzy model (1)-(8), not requiring prior explicit expert knowledge. The data from which the FLS is extracted iscomposed of a set of input observations X (9), and a corresponding set of output observations y =

[

y1, . . . , yL

]T ∈ RL,where yl is the output corresponding to xl, for l = 1, . . . , L.

The HPSO is constructed based on six hierarchical levels (Figure 1). The first level represents the particles of theset of input variables and their respective time delays. The particles of the second level represent all the antecedentfuzzy membership functions which constitute the fuzzy rules of the fuzzy system. Particles at the third level representparameters sets for the consequent parts of the fuzzy rules. The individual rule particles are defined on the fourthlevel, and the particles that represent sets of rules are obtained on the fifth level. Finally, the sixth level representsT-S fuzzy models, where the particles include the indices of the selected elements of the previous levels. The detaileddescription of each level is given below.

Level 1: it is composed of particles that represent the set of input variables, and respective delays. The particles ofthis level are represented by binary encoding, where each allele (element of the particle located in a specific position)corresponds to each input variable/delay pair (see Figure 1). The length of the particle is given by the total number ofpairs of system variables and respective delays that are considered as possible candidates to be used as inputs for the

8

T-S fuzzy system.Level 2: contains Gaussian membership functions defined in the universes of the variables involved. The particle

is erected by the aggregations of all Gaussion partition sets associated with the input variables. A partition set of avariable is a collection of fuzzy sets associated to the variable.

Level 3: it is constructed based on particles, where each particle represents the consequent parameters that canbe used for a fuzzy rule. The length of each particle in this level is determined by the maximum number of inputvariables. The particle is represented by real number encoding. On each particle, the specific alleles that are takeninto account for a rule are the ones that correspond to the collection of input variables selected by Level 1. Null valuesindicate the absence of consequent parameter value for the corresponding variable.

Level 4: it is formed by particles of individual rules. The length of the particle is given by the maximum numberof antecedent variables plus one. The particle is represented by non-negative integer encoding, where each allelecontains the index of the corresponding antecedent membership function (defined at Level 2), except the last allelein each particle which is appointed to select the indices of Level 3. Null values indicate the absence of membershipfunction for the corresponding variable (i.e. the absence of the variable) and are only considered for antecedent indices.

Level 5: it is constituted by a set of fuzzy rules, where each allele contains the index of the corresponding individualrule that has been included in the set. The particle is represented by integer encoding, where once again, null valuesindicate that the corresponding allele does not contribute to the inclusion of any rule in the set of fuzzy rules. Thelength of each particle is determined by the maximum number of fuzzy rules.

Level 6: it represents a control and specification unit for the T-S fuzzy model (TSFM). Key information requiredto develop and describe a TSFM design is contemplated on this level. The particle is represented by integer encodingand is constituted by three alleles. The first allele indicates the index, b, of the set of rules specified on Level 5. Thesecond allele contains the s-th invividual of Level 2 which represents the s-th coolection of partition sets given byLevel 2. The third allele represents the index, m, of the set of input variables selected on Level 1 (see Figure 1).

Level 6 is a flexible unit. Other alleles can be added to this unit in order to design a new HPSO usable withdifferent characteristics. For example, a new extended HPSO with more complexity can be constructed for a directfuzzy controller (DFC) design, where other alleles can be added to correspond to other important fuzzy elements suchas t-norm operators, inference engine, and defuzzifier methods, and where, consequently, such additional elementsof the fuzzy system can have a chance to be identified in a competition by the PSO methodology. However, as canbe concluded from section 2.1, the design of the T-S fuzzy model (as the case study of this work) is composed by:singleton fuzzifier (direct representation of a number by a singleton fuzzy set), center average defuzzifier, and theproduct inference engine (Wang, 1996).

The main steps of the HPSO algorithm used to learn/improve the FLS parameters are presented in Algorithm 2.Each level utilizes its own PSO parameters and fitness function to estimate the new position of the particles in thatlevel. The fitness evaluation of each individual of the hierarchical particles in each level is defined as follows:

• Control unit of the fuzzy logic system (Level 6):The value of the fitness function of the a-th particle in Level 6 is defined by

Ja6 =

1MSE(y, ya)

, (21)

where MSE(y, ya) = 1L

∑Ll=1 (yl − yl)2, is the mean square error of the a-th fuzzy system (a = 1, . . . , amax), yl is

the l-th output observation (target), and yl is the corresponding output value predicted by the FLS;

• Units of Levels 1, 2, and 5:The value of the fitness function of the u-th particle in Level e, for e = 1, 2, 5, is evaluated by

Jue = max

(

Jfe,16 , · · · , J

fe,p(e)

6

)

, (22)

where { fe,1, . . . , fe,p(e)} ⊆ {1, . . . , amax} is the subset of all the particles of Level 6 that contain the u-th individualof Level e (see Figure 1);

• Individual rule (Level 4):The value of the fitness function of the d-th particle in Level 4 is defined by

Jd4 = max(J

r1

6 , . . . , Jrq

6 ), (23)

9

Algorithm 2 Hierarchical Particle Swarm Optimization (HPSO) Algorithm.

Inputs: Set of input observations X (9), and a corresponding set of output observations y =[

y1, . . . , yL

]T ∈ RL,

where yl is the output corresponding to xl, for l = 1, . . . , L; Maximum number of particles for each level, amax, bmax,dmax, zmax, smax, mmax; Velocity limits for each level, v

p

min=

[

vp

min,i1

]

, vpmax =

[

vp

max,i1

]

, p = 1, . . . , 6; Maximum numberof iterations, T1; wr,min, wr,max, c1r, and c2r PSO parameters of the r-th particle for each level; Maximum number ofparticles in each level, rmax

Output: Optimized T-S fuzzy model (TSFM);

procedure

for all levels p = 1, 2, . . . , 6 do // For all levels of the HPSOfor all particles r = 1, 2, . . . , rmax, of Level p do // For all particles of the level

if p ∈ {2, 4, 5} or (p = r = 1) or (p = 6 and r = 1)then Initialize x

pr of p-th Level using the results extracted with UFCA Algorithm 1;

else Randomly initialize xpr endif

Initialize vpr of p-th Level randomly within the velocity range of (vp

min, v

pmax);

pbestpr = x

pr ;

end for

end for

Evaluate all particles using one of the equations (21)-(24), according to the level of each particle;Identify the gbestp of the swarm p, for p = 1, . . . , 6;t ← 1;while t ≤ T1 do

for all levels p = 1, 2, . . . , 6 do

for all particles r = 1, 2, . . . , rmax, of Level p do

Compute wpr (t) and v

pr (t) using (20) and (18), respectively;

Update xpr (t) using (19);

end for

end for

Evaluate J(r)p , for each particle r = 1, 2, . . . , rmax, of each level p = 1, 2, . . . , 6, using (21)-(24);

Update pbestpr (t) and gbestp(t) based on the J

(r)p results from (21)-(24), for p = 1, 2, . . . , 6, r = 1, 2, . . . , rmax;

t ← t + 1;end while

end procedure

where {r1, . . . , rq} ⊆ {1, . . . , amax} is the subset of all the particles of Level 6 that involve the d-th individual ruleof Level 4 (indirect involvement through Level 5, see Figure 1);

• Consequent sets (Level 3):The value of the fitness function of the z-th particle in Level 3 is defined by

Jz3 = max(J

g1

6 , . . . , Jgp

6 ), (24)

where {g1, . . . , gp} ⊆ {1, . . . , amax} is the subset of all the particles of Level 6 that involve the z-th consequent setof Level 3 (indirect involvement through Levels 5 and 4, see Figure 1).

Each level of the particles’ hierarchy is evolved separately as an independent PSO algorithm using its own par-ticles, own fitness function, and own PSO parameters. However, since the values of the fitness evaluations of theparticles in each level also depend on the particles in other levels, then the evolution of each level is also influencedby the evolution of all the other levels.

4.3. Adaptive Online the T-S Fuzzy Modelling

The results extracted by the HPSO methodology, Sec. 4.2, and Algorithm 2 will be used to initialize the T-S fuzzymodel for the operation in online mode. For the online process the consequent parameters vector Θ (7) is denoted by

10

Θon, and is self-adjusted, being updated as follows whenever a new data sample becomes available at time instant k:

Θon(k) = Θon(k − 1) + ∆Θon(k), (25)

whereΘon is initialized by the consequent parameters that have resulted from Algorithm 2, and ∆Θon is calculated foreach incoming sample at each time instant k using the S-AHPSO algorithm as described in this sub-section.

PSO has demostrated some important advances by yielding high convergence speed in specific problems. How-ever, it does exhibit some shortages. It has been discovered that PSO has a small capacity to search at a fine grainscale because it lacks an adequate mechanism to control particles’ velocity. In PSO, the inertia weight is an importantparameter which significantly affects the speed of PSO convergence. With many researches, it was proved that theperformance of PSO can be improved if an appropriate adaptive strategy for the inertia weight (Ratnaweera et al.,2004) replaces typical inertia weight strategies like the conventional linearly decreasing inertia weight PSO (LDW-PSO) (Shi and Eberhart, 1998). For example, a variable inertia weight can be formulated to depend on the particle’sposition or velocity. In fact, applying an adaptive inertia weight can balance global exploration and local exploitationabilities of the swarm.

Motivated by these facts, this paper proposes an adaptive process for obtaining the inertia weight. For each newlyavailable data sample at time instant k, S-AHPSO runs PSO iterations t = 1, . . . ,T2. For each iteration t of S-AHPSO,the inertia weight wr for each particle r is obtained from feedback taken from the best memories of the individualparticle and of the swarm as follows:

wr(t) = wr,min + (wr,max − wr,min) exp

− γ∣

∣ y(pbestr)(t) − y(gbest)(t)∣

∣ + β× t − 1

T2 − t + δ(t − T2)

[1 − δ(t − T2)], (26)

where γ and β are positive constants, δ(k) is the Kronecker Delta function, where δ(k) = 0, for all k ∈ N, exceptδ(0) = 1, and T2 is the number of S-AHPSO iterations in t at each time instant k. Also, y(pbestr)(t) is the result obtainedby the best previous position of the r-th particle for all non-future values of t for the current k, and y(gbest)(t) is the bestoutput estimation attained among all particles of the swarm, until the current iteration t.

If the effect of∣

∣ y(pbestr)(t) − y(gbest)(t)∣

∣ is not considered in (26), then, as can be seen, the inertia weight in both (20)and (26) follows a decreasing tendency. On the other hand, if a particle r is far away from the swarm, then the term∣

∣y(pbestr)(t) − y(gbest)(t)∣

∣ in the denominator in (26) will reduce the decreasing tendency of the particles’ inertia weights,and thus increase the degree of exploration in order to give better possibilities for particle r to join the swarm.

For updating of consequent rules in order to perform the online construction of the T-S fuzzy model, Eq. (16) isreformulated as:

vr(t + 1) = wr(t)vr(t) + c1rr1(pbestr(t) − ∆Θr(t)) + c2rr2(gbest(t) − ∆Θr(t)), (27)

and the new position of every particle can be obtained as

∆Θr(t + 1) = ∆Θr(t) + vr(t + 1). (28)

The complete proposed S-AHPSO online adaptive system identification algorithm is presented in Algorithm 3.For each new data sample that becomes available at time instant k, when the S-AHPSO algorithm terminates the PSOiterations t = 1, . . . ,T2 for that sample k, the final ∆Θr(t) value is used as ∆Θon(k) in (25).

The numbers of iterations T1 and T2 of Algorithms 2, and 3, respectively, could be seen as the maximum numbersof iterations if early termination conditions (e.g. as a function of fitness) would be introduced in these algorithms.

The fitness function applied in Algorithm 3 is

J(r) = 1/MSE(r) =[

(yre(k) − y(r)(k))2]−1, (29)

where MSE(r) is the mean square error of the r-th particle, yre(k) is the target output value at instant k, and y(r)(k) is aprediction of yre(k) calculated by the r-th particle using (25) and (2).

11

Algorithm 3 Self-adaptive Hierarchical Particle Swarm Optimization (S-AHPSO) Algorithm.Input: The T-S fuzzy system learned by the HPSO Algorithm 2 (input variables, antecedent parameters, consequentparameters, the fuzzy rules, and the final learned model parameters).Output: ∆Θon(k), for each new data sample that becommes available during the online operation at time instant k;

1. Initialization:(a) Select rmax, the maximum number of particles for the swarm ∆Θr; as well as the maximum number of

iterations T2; Select the velocity limits vmin= [vmin,i1], vmax = [vmax,i1];

(b) Design the PSO parameters wr,max, wr,min, c1r, c2r for each particle r of the swarm ∆Θr;(c) Initialize [Θon](c N)×1 of the T-S fuzzy model using the best result obtained from Level 3 of the Algorithm 2,

where [Θon](c N)×1 is the consequent parameters vector and c and N are the optimal number of individualfuzzy rules and the maximum number of selected input variables, respectively, which all have resultedfrom Algorithm 2;

2. For/using each newly arriving online sample k, do:

for all particles of the swarm, r = 1, 2, . . . , rmax do

Randomly initialize [∆Θr](c N)×1;Initialize vr randomly within the velocity range (vmin, vmax);Let pbestr ← ∆Θr;

end for

Evaluate each particle of the swarm ∆Θr using the fitness function J(r) (29);Identify the best particle gbest;for all t = 1, . . . ,T2 do

Compute wr(t) and vr(t + 1) using (26) and (27), respectively;Update ∆Θr(t) using (28);Calculate the fitness function of each particle of the swarm ∆Θr using the fitness function J(r) (29);Update pbestr and gbest;

end for

Let ∆Θon(k)← ∆Θr(t);Adapt the T-S fuzzy model parameters (Θ of Eq. (7)) using (25) ;

5. Experiments and Results

In this section, the performance of the proposed identification methodology is studied. The study is performedon a soft sensor application. The objective of the experiment is to predict the dynamic behaviour of the dissolvedoxygen concentration in the final compartment of the biological reactor within a wastewater treatment plant (WWTP).Many control strategies have been proposed in the literature of this application but their evaluation and comparison,either practical or based on simulation is difficult. This is due to a number of reasons, including: the variability ofthe influent, the complexity of the biological and biochemical phenomena, the large range of time constants varyingfrom a few minutes to several days, the lack of standard evaluation criteria among other things, due to region specificeffluent requirements and cost levels. The methodology proposed in this paper can be employed to obtain a modelthat can be used to address these concerns. The T-S fuzzy model that results from Algorithms 2 and 3 can be used toinitialize and update an adaptive fuzzy predictive controller design such as the adaptive fuzzy generalized predictivecontroller (AFGPC) which has recently shown successful performance in nonlinear industrial applications (Mendeset al., 2013). A data set of the plant is obtained with the aim of providing a set of input/output data necessary for theHPSO to learn the FLS’s parameters of the T-S fuzzy model. The data set is obtained by applying to the process thecommand signal KLa5, that manipulates the oxygen transfer process from an aerator to the activated sludge inside thebiological reactor, where KLa5 is obtained from the controller proposed in (Belchior et al., 2012), and recording alongtime the values of the input and output variables that constitute the data set.

12

Figure 2: General overview of the BSM1 plant (Belchior et al., 2012).

5.1. General Characteristics

Waste water treatment plants (WWTPs) are industrial structures designed to remove biological or chemical wasteproducts from water. They are complex nonlinear systems subject to large disturbances in the influent flow rateand pollutant load, together with uncertainties concerning the composition of the incoming wastewater (Belchioret al., 2012). BSM1 is a platform-independent simulation environment of WWTPs which has been undertaken inEurope by Working groups of COST Action 682 and 624 that is dedicated to the optimization of performance andcost-effectiveness of wastewater management systems (Jeppsson and Pons, 2004). This development work is nowcontinued under the umbrella of the IWA Task group on Benchmarking of Control Strategies for WWTPs. Thearchitecture proposed in this paper is tested in the Benchmark Simulation Model no.1 (BSM1). A general overviewof the BMS1 plant is presented in Figure 2. The benchmark plant is composed of a five compartment activatedsludge reactor consisting of two anoxic tanks followed by three aerobic tanks. Each compartment is characterizedby flow rate Qk, concentration Zk, volume Vk, and reaction rate rk. Volumes Vk for non-aerated compartments areV1 = V2 = 1.000 [m3], and for aerated compartments are V3 = V4 = V5 = 1.333 [m3]. Compartments 3-4 have afixed oxygen transfer coefficient (KLa = 10 [h−1] = 240 [days−1]) while in compartment 5, the dissolved oxygen (DO)concentration, DOC , is controlled by manipulation of the KLa5. For more details about the BMS1 plant, references(Jeppsson and Pons, 2004; Belchior et al., 2012; Mendes et al., 2014a) are recommended. The sampling period is15 [min], and the simulations have a maximum of 14 [days].

5.2. Application to Wastewater Treatment System

For the learning process of the HPSO algorithm, a data set has been gathered from the plant opera-tion and consists of 10 main variables (u1, . . . , u10), where 9 of them are input variables (u1, . . . , u9) =(S S , XS , XI , S NO, S NH , XND, S ND, XB,H ,KLa5), and the remaining variable is the target output variable to be estimated,y = u10 = DOC . Figure 3 illustrates the input variables of the data set. The plant variables are described in Table 1.

The available data set is constructed by using of the first three delayed versions of each of the variables u1, . . . , u9,as candidates for the inputs of the Takagi-Sugeno (T-S) model. Therefore, the following combinations of processvariables and delays were specified to be used as the main candidates for inputs of the Takagi-Sugeno fuzzy model topredict y(t): [u1(t − 1), u1(t − 2), u1(t − 3), . . . , u9(t − 1), u9(t − 2), u9(t − 3)]. The total data set includes 1344 sampleswhere the first 480 samples are used as the training data set to learn the T-S fuzzy model parameters by the HPSOalgorithm and the remaining 864 samples are used as the test data set for S-AHPSO. The selected degree of fuzzinesswas set to η = 2 in Algorithm 1. The PSO parameters wr,max = 1, wr,min = 0.3, c1r = 1.5, c2r = 2 are used incommon among all particles in both Algorithm 2 and Algorithm 3. In Algorithms 2 and 3, all components of theminimum and maximum velocities are v

p

min,i1 = vmin,i1 = −2, and vp

max,i1 = vmax,i1 = 2, respectively. The maximumnumber of iterations of the HPSO algorithm is T1 = 200. For the S-AHPSO algorithm, γ = 1, β = 0.1, and maximumnumber of iterations T2 = 50 were considered. The numbers of particles for each level of the HPSO architecture are:amax = bmax = 30, dmax = zmax = 50, smax = 40, and mmax = 20. Also, rmax = 25 is considered for the S-AHPSOarchitecture. The aforementioned parameters in this paragraph were tuned by the designer. On the other hand, the

13

0 2 4 6 8 10 12 14Time (d)

0.60.81.01.21.41.61.82.02.2

Conc

entr

atio

n (m

g.l�

1)

SS

0 2 4 6 8 10 12 14Time (d)

20

30

40

50

60

70

Conc

entr

atio

n (m

g.l�

1)

XS

0 2 4 6 8 10 12 14Time (d)

50

60

70

80

90

100

110

Conc

entr

atio

n (m

g.l�

1)

XI

0 2 4 6 8 10 12 14Time (d)

2468

1012141618

Conc

entr

atio

n (m

g.l�

1)

SNO

0 2 4 6 8 10 12 14Time (d)

02468

10121416

Conc

entr

atio

n (m

g.l�

1)

SNH

0 2 4 6 8 10 12 14Time (d)

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Conc

entr

atio

n (m

g.l�

1)

XND

0 2 4 6 8 10 12 14Time (d)

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

Conc

entr

atio

n (m

g.l�

1)

XB,H

0 2 4 6 8 10 12 14Time (d)

1.01.52.02.53.03.54.04.55.05.5

Conc

entr

atio

n (m

g.l�

1)

SND

0 2 4 6 8 10 12 14Time (d)

100

150

200

250

300

350

400

Conc

entr

atio

n (m

g.l�

1)

KLa5

Figure 3: Input variables of the data set applied to the HPSO algorithm for learning.

Table 1: List of BSM1 variables.

Notation DescriptionS S Readily biodegradable substrate;XS Slowly biodegradable substrate;XI Particulate inert organic matter;

XB,H Active heterotrophic biomass;XB,A Active autotrophic biomass;XP Particulate products arising from biomass decay;S O Oxygen;S I Soluble inert organic matter;

S NO Nitrate and nitrite nitrogen;S NH NH4+ + NH3 nitrogen;S ND Soluble biodegradable organic nitrogen;XND Particulate biodegradable organic nitrogen;KLa Oxygen transfer coefficient;

S ALK Alkalinity;DOC Dissolved oxygen concentration.

optimal number of clusters c∗ is self-adjusted by Algorithm 1 that obtained a value of c∗ = 10. Also, the consequentparameters are self-adjusted, being calculated by Algorithm 3.

14

Time [d]5 6 7 8 9 10 11 12 13 14

y=

DOC

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5Real outputPrediction (S-AHPSO)

R

(a)

Time [d]5.6 5.8 6 6.2 6.4 6.6 6.8 7

y=

DOC

4.2

4.3

4.4

4.5

4.6

4.7

4.8

4.9R

Real output Prediction (S-AHPSO)

(b)

Figure 4: (a) Modeling performance of the proposed system identification methodology for the y = DOC variable of the wastewater treatmentsystem data set; (b) magnified view comparing the real output and predicted output variables corresponding to the rectangle “R” depicted insideFigure 4(a).

Table 2: Comparison results on the test data set for the wastewater treatment plant (WWTP). The MSE values have been multiplied by 105.

Method Number Number Inputs MSE CT [minutes]of rules of inputs

FCM-FRLS (Kulhavý, 1987) - - all candidate variables 67.4 2.803UFCA-FRLS (Kulhavý, 1987) - - all candidate variables 63.5 3.655RPLS (Dayal and MacGregor, 1997) - - all candidate variables 838.2 4.375MLPNN (Nelles, 2001) - - all candidate variables 81.5 16.247NFCRMA (Random initialization) (Li et al., 2009) - - all candidate variables 1689.2 10.36UFCA-NFCRMA (Li et al., 2009) - - all candidate variables 267.6 10.55APSO (FCM initialization) (Alfi and Modares, 2011) - - all candidate variables 135.9 13.75UFCA-APSO (Alfi and Modares, 2011) - - all candidate variables 97.6 14.88ELM (Huang et al., 2012) - - all candidate variables 2142.2 5.953HGA (Mendes et al., 2012) 32 13 u1(t − 2), u2(t − 1), u2(t − 2), u3(t − 1) 74.1 24.66

u3(t − 3), u4(t − 3), u5(t − 1), u6(t − 1)u7(t − 1), u7(t − 3), u8(t − 1), u9(t − 1), u9(t − 2)

Proposed Method 20 8 u2(t − 1), u3(t − 1), u3(t − 3), u4(t − 1), u5(t − 2)u7(t − 1), u9(t − 2), u9(t − 3) 44.5 13.07

Figure 4 illustrates the predicted values obtained by the proposed method and desired (real) values of the targetvariable y = DOC to be estimated, for the data set of the WWTP experiment. Figure 4b presents a magnified viewcomparing the real output and predicted output variables corresponding to the rectangle “R” depicted inside Figure 4a.Numerical results, with the best fitness function, comparing the performance of the proposed method and the worksFCM-RLS (Kulhavý, 1987), RPLS (Dayal and MacGregor, 1997), MLPNN (Nelles, 2001), NFCRMA (Li et al.,2009), APSO (Alfi and Modares, 2011), ELM (Huang et al., 2012), and HGA (Mendes et al., 2012) are presented inTable 2. For a test dataset with L samples, define Jbest =

1MSE as the fitness function, and MSE =

∑Lk=1(y(k) − yre(k))2,

where yre is the real plant output value. The main objective here is to obtain a large value of Jbest which correspondsto the minimum of the MSE. Based on the results in Table 2, the largest value of the fitness function in the test dataset is obtained with the method proposed in this paper. Also, comparing with the HGA, the proposed method uses alower number of variables and fuzzy rules, but shows a better prediction performance and faster learning performance.The computational time (CT) of each of the used methods are given in Table 2. With a Intel(R) core (TM) i7-2600and CPU 3.4GHz, the simulation time of the proposed algorithm was 53% of the simulation time of the HGA. Also,from the results in Table 2 it is seen that 8 input variables u2(t − 1), u3(t − 1), u3(t − 3), u4(t − 1), u5(t − 2), u7(t − 1),u9(t − 2), and u9(t − 3), were selected by the proposed methodology (among all 27 the input (variable,delay) paircandidates that were considered) for the T-S fuzzy model of the BSM1 process that was learned. The membershipfunctions extracted by Level 2 for variables selected by Level 1 for the whole T-S fuzzy model obtained by the HPSO

15

100 150 200 250 300 350KLa5(t−3)

0.0

0.2

0.4

0.6

0.8

1.0

Fuzzy Value

HPSO

UFCA-HPSO

(a)

20 30 40 50 60 70XS(t−1)

0.0

0.2

0.4

0.6

0.8

1.0

Fuzzy Value

HPSO

UFCA-HPSO

(b)

60 70 80 90 100XI(t−1)

0.0

0.2

0.4

0.6

0.8

1.0

Fuzzy Value

HPSO

UFCA-HPSO

(c)

2 4 6 8 10 12 14 16 18SNO(t−1)

0.0

0.2

0.4

0.6

0.8

1.0

Fuzzy Value

HPSO

UFCA-HPSO

(d)

0 2 4 6 8 10 12 14 16SNH(t−2)

0.0

0.2

0.4

0.6

0.8

1.0

Fuzzy Value

HPSO

UFCA-HPSO

(e)

1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0SND(t−1)

0.0

0.2

0.4

0.6

0.8

1.0

Fuzzy Value

HPSO

UFCA-HPSO

(f)

60 70 80 90 100XI(t−3)

0.0

0.2

0.4

0.6

0.8

1.0

Fuzzy Value

HPSO

UFCA-HPSO

(g)

100 150 200 250 300 350KLa5(t−2)

0.0

0.2

0.4

0.6

0.8

1.0

Fuzzy Value

HPSO

UFCA-HPSO

(h)

Figure 5: Membership functions of the partition set of the selected variables obtained by the HPSO method with random initialization and by usingthe UFCA as the initialization algorithm.

with random initialization, and by the HPSO initialized by the UFCA algorithm are shown in Figure 5.Some points related to the results of the methods’ performance presented in Table 2 are worth to note. The

MLPNN method was run based on two different types of iterations. The method has an external iterative cycle, witha maximum number of 1000 iterations for this example which is considered as a stop criterion for the MLPNN.Also the MLPNN learning is by the Levenberg-Marquardt method which has an inner additive iteration to updateneural network weights. Updated neural network weights which result from the inner iteration will initialize theweights of the next external iteration. Implementation of the HGA was also accompanied with more iterations thanwas considered firstly. The method was initally programmed for 1000 iterations first. However, due to the randominitialization, the result by the HGA method just depicted after 100 times runs of the complete algorithm with 1000iterations each. The result in Table 2 is the best result among all repetitions of the HGA. The performance of the APSOmethod (Alfi and Modares, 2011) has shown a variety of sensitivity phenomena to some parameters inside the APSOalgorithm. The best result was obtained just after all observations on the APSO’s performance variations resultingfrom the variations of the aforementioned significant parameters. However, the resulting fitness evaluation obtainedby the Algorithm 2 and the Algorithm 3, and shown in Table 2, is simply obtained with just one test of the proposedmethod with T1 = 200 and T2 = 50, respectively. Furthermore, the APSO’s initialization was performed with thepresented UFCA method. A final point should be mentioned regarding to the efficiency of the UFCA method: basedon results in Table 2, the efficient role of the UFCA to enhance performance of other regressors is also remarkable.

In another test, it is investigated the robustness of Algorithm 3 to changes in the system being modeled. Nor-mally, the oxygen transfer coefficient in compartment 4 has been constant KLa4 = 10 [h−1] = 240 [days−1] for0 ≤ d ≤ 14 [days]. However, now a disturbance dKla4(k) = −240 [days−1] is defined as a change in the oxygentransfer coefficient in compartment 4 to KLa4 = 0 [days−1], where the change is in effect during 7 ≤ d ≤ 9 [days].Also the external carbon (EC) source to reactor 5 which was constant equal to 0 [kg COD.days−1] (COD is theabbreviation for chemical oxygen demand), for 0 ≤ d ≤ 14 [days], now is changed to a disturbance value ofdCarbon5(k) = 20 [kg COD.days−1], where the change is in effect during 11 ≤ d ≤ 13 [days]. Figure 6, shows theeffect of the disturbances on the BSM1 during the application of the Algorithm 3. In Figure 6, values of both dKla4(k)and dCarbon5(k) are presented after multiplying them by −0.01 and 0.1 factors, respectively. The results reveal thatusing Algorithm 3, the T-S fuzzy model constructed by S-AHPSO can be adapted well in response to changes in theplant and can track the output accurately.

16

5 6 7 8 9 10 11 12 13 14Time [d]

0

1

2

3

4

5

6

7

Valu

e

y=DOC Real output [g.cm3 ]y=DOC (S-AHPSO) [g.cm3 ]dKla4(k)/(100) Disturbance [days1 ]dCarbon5(k)/10 Disturbance [kg COD.days1 ]

(a)

Figure 6: Results of the S-AHPSO Algorithm 3 in the adaptation of the learned model for estimating the y = DOC variable of the wastewatertreatment system data set in presence of disturbances.

6. Conclusion

An adaptive methodology for online learning of Takagi-Sugeno (T-S) fuzzy models based on a hierarchical particleswarm optimization (HPSO) algorithm was proposed. To improve the convergence time of the proposed HPSO,particles positions are initialized by using an efficient unsupervised fuzzy clustering algorithm (UFCA). While theHPSO was proposed to automatically extract all the parameters of the T-S fuzzy model from a set of input/outputdata, as another contribution of this paper, a self-adaptive HPSO (S-AHPSO) algorithm was proposed for onlineidentification of T-S fuzzy models. To validate and demonstrate the performance and effectiveness of the proposedalgorithms, identification of the dissolved oxygen in an activated sludge reactor within a wastewater treatment plant(WWTP) was studied. The results show that the proposed method successfully extracted all parameters of the T-Sfuzzy model. The proposed framework can extract a model that represents the dynamics of a plant, using only a dataset of the process, where the model can be further used to estimate the output of the plant. Moreover, the resultsreveal superior performance of the proposed methods when compared with other state of the art methods. As a futurework, the fuzzy modeling methodology proposed in this paper will be used to design an adaptive fuzzy generalizedpredictive controller as a proposed solution to address the current concerns in control design inside of BSM1 WWTPenvironment.

Acknowledgment

This work was supported by Project SCIAD “Self-Learning Industrial Control Systems Through Process Data”(reference: SCIAD/2011/21531) co-financed by QREN, in the framework of the “Mais Centro - Regional OperationalProgram of the Centro”, and by the European Union through the European Regional Development Fund (ERDF).

Saeid Rastegar was supported by Fundação para a Ciência e a Tecnologia (FCT) under grant SFRH/BD/89186/2012,and Jérôme Mendes was supported by FCT under grants SFRH/BD/63383/2009 and SFRH/BPD/99708/2014.

References

Alfi, A., Modares, H., 2011. System identification and control using adaptive particle swarm optimization. Applied Mathematical Modelling 35,1210–1221.

17

Ali, S.F., Ramaswamy, A., 2009. Optimal fuzzy logic control for mdof structural systems using evolutionary algorithms. Engineering Applicationsof Artificial Intelligence 22, 407–419.

Barros, J.C., Dexter, A.J., 2007. On-line identification of computationally undemanding evolving fuzzy models. Fuzzy Sets and Systems 158,1997–2012.

Belchior, C.A.C., Araújo, R.A.M., Landeck, J.A.C., 2012. Dissolved oxygen control of the activated sludge wastewater treatment process usingstable adaptive fuzzy control. Computers & Chemical Engineering 37, 152–162.

Chang, W.J., Ku, C.C., Huang, P.H., 2010. Robust fuzzy control for uncertain stochastic time-delay takagi-sugeno fuzzy models for achievingpassivity. Fuzzy Sets and Systems 161, 2012–2032.

Cover, T., Hart, P., 1967. Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 21–27.Dayal, B.S., MacGregor, J.F., 1997. Recursive exponentially weighted pls and its applications to adaptive control and prediction. Journal of Process

Control 7, 169–179.Delgado, M.R., Nagai, E.Y., de Arruda, L.V.R., 2009. A neuro-coevolutionary genetic fuzzy system to design soft sensors. Soft Computing - A

Fusion of Foundations, Methodologies and Applications 13, 481–495.Delgado, M.R., Zuben, F.V., Gomide, F., 2001. Hierarchical genetic fuzzy systems. Information Sciences 136, 29–52.Dovzan, D., Logar, V., Skrjanc, I., 2015. Implementation of an evolving fuzzy model (efumo) in a monitoring system for a waste-water treatment

process. IEEE Transactions on Fuzzy Systems 23, 1761–1776.Dovžan, D., Škrjanc, I., 2011. Recursive fuzzy c-means clustering for recursive fuzzy identification of time-varying processes. ISA Transactions

50, 159–169.Eberhart, R. ., Shi, Y., 1998. Comparison between genetic algorithms and particle swarm optimization, in: Evolutionary Programming VII. volume

1447 of Lecture Notes in Computer Science, pp. 611–618.Hua, H., Liu, Y., Lu, J., Zhu, J., 2013. A new impulsive synchronization criterion for t-s fuzzy model and its applications. Applied Mathematical

Modelling 37, 8826–8835.Huang, G.B., Hongming Zhou, X.D., Zhang, R., 2012. Extreme learning machine for regression and multiclass classification. IEEE Transactions

on Systems, And Cybernetics-Part B: Cybernetics 42, 513–529.Jeppsson, U., Pons, M., 2004. The COST benchmark simulation model-current state and future perspective. Control Engineering Practice 12,

299–304.Kennedy, J., Eberhart, R., 1995. Particle swarm optimization, in: Proceedings of the IEEE International Conference on Neural Networks, pp.

1942–1945.Kulhavý, R., 1987. Restricted exponential forgetting in real-time identification. Automatica 23, 589–600.Li, C., Zhou, J., Xiang, X., Li, Q., An, X., 2009. T-s fuzzy model identification based on a novel fuzzy c-regression model clustering algorithm.

Engineering Applications of Artificial Intelligence 22, 646–653.Li, Y., Du, Y., 2012. Indirect adaptive fuzzy observer and controller design based on interval type-2 t-s fuzzy model. Applied Mathematical

Modelling 36, 1558–1569.Mendes, J., Araújo, R., Matias, T., Seco, R., Belchior, C., 2014a. Automatic extraction of the fuzzy control system by a hierarchical genetic

algorithm. Engineering Applications of Artificial Intelligence 29, 70–78.Mendes, J., Araújo, R., Matias, T., Seco, R., Belchior, C., 2014b. Automatic extraction of the fuzzy control system by hierarchical genetic

algorithm. Engineering Applications of Artificial Intelligence 29, 70–78.Mendes, J., Araújo, R., Souza, F., 2013. Adaptive fuzzy identification and predictive control for industrial processes. Expert Systems with

Applications 40, 6964–6975.Mendes, J., Souza, F., Araújo, R., Gonçalves, N., 2012. Genetic fuzzy system for data-driven soft sensors design. Applied Soft Computing 12,

3237–3245.Nelles, O., 2001. Nonlinear System Identification: from Classical to Neural Networks and Fuzzy Models. Springer, Chicago, USA.Rastegar, S., Araújo, R., Mendes, J., 2016. A new approach for online t-s fuzzy identification and model predictive control of nonlinear systems.

Journal of Vibration and Control 22, 1820–1837.Rastegar, S., Araújo, R., Mendes, J., Matias, T., Emami, A., 2014. Self-adaptive takagi-sugeno model identification methodology for industrial

control processes, in: 40th Annual Conference of the IEEE Industrial Electronics Society (IECON 2014), IEEE. pp. 281–287.Ratnaweera, A., Halgamuge, S.K., Watson, H.C., 2004. Self-organizing hierarchical particle swarm optimizer with time-varying acceleration

coefficients. IEEE Transactions on Evolutionary Computation 8, 240–255.Salahshoor, K., Hamzehnejad, M., Zakeri, S., 2012. Online affine model identification of nonlinear processes using a new adaptive neuro-fuzzy

approach. Applied Mathematical Modelling 36, 5534–5554.dos Santos Coelho, L., Herrera, B.M., 2007. Fuzzy identification based on a chaotic particle swarm optimization approach applied to nonlinear

yo-yo motion system. IEEE Transactions on Industrial Electronics 54, 3234–3245.Shi, Y., Eberhart, R. ., 1998. A modified particle swarm optimizer, in: Proceedings of the IEEE International Conference on Evolutionary

Computation,USA, pp. 69–73.Shuzhi, G., Xing, D., Xianwen, G., 2012. The temperature system identification of the pvc stripper tower top based on pso-fcm optimized t-s

model, in: Proceedings of the IEEE International Conference on Control and Decision (CCD), pp. 2529–2532.Škrjanc, I., 2009. Confidence interval of fuzzy models: An example using a waste-water treatment plant. Chemometrics and Intelligent Laboratory

Systems 96, 182–187.Souza, F., Santos, P., Araújo, R., 2010. Variable and delay selection using neural networks and mutual information for data-driven soft sensors, in:

Proc. IEEE Conference on Emerging Technologies and Factory Automation (ETFA), 2010, pp. 1–8.Takagi, T., Sugeno, M., 1985. Fuzzy identification of systems and its applications to modelling and control. IEEE Transactions on Systems, Man,

and Cybernetics 15, 116–132.Tang, Y., Sun, F., 2005. Improved validation index for fuzzy clustering, in: American Control Conference, IEEE. pp. 1120–1125.Wang, L.X., 1996. Stable adaptive fuzzy controllers with application to inverted pendulum tracking. IEEE Transactions on Systems, Man, and

Cyberneetics, Part B: Cybernetics 26, 677–691.

18

Windham, M.P. ., 1982. Cluster validity for the fuzzy c-means clustering algorithm. IEEE Transactions Pattern Analysis and Machine IntelligencePAMI-4, 357–363.

Yao, Y., Mi, J., Li, Z., 2014. A novel variable precision (θ, σ)-fuzzy rough set model based on fuzzy granules. Fuzzy Sets and Systems 236, 58–72.Ying, H., 1997. General miso takagi-sugeno fuzzy systems with simplified linear rule consequent as universal approximators for control and

modelling applications, in: Proc. IEEE International Conference on Systems, Man, and Cybernetics, pp. 1335–1340.

19