Fuzzy Modeling using Vector Quantization and Local Linear ...simpliﬁed fuzzy modeling [12], [14]. In this paper, the concept of Vector Quantization such as NG and k-means with LLM

Fuzzy Modeling using Vector Quantization andLocal Linear Mapping

Hirofumi Miyajima, Noritaka Shigei, Hiromi Miyajima

Abstract—Fuzzy modeling has been extensively studied. Ithas been shown that fuzzy modeling using vector quantization(VQ) and steepest descent method (SDM) is effective in termsof the number of rules (parameters). In the methods, the initialparameters of fuzzy rules by using VQ with learning datafirstly are determined, then the parameters are adjusted byusing SDM. On the other hand, Neural Gas (NG) is knownas a novel approach of VQ and NG with local linear mapping(LLM) has been applied to a time-series prediction problem.In the application, the predicted value in each of subregions isapproximated by using a corresponding linear mapping. It hasbeen demonstrated that, compared with RBF, NG with LLMis advantageous in terms of the accuracy and the number ofrules. The idea of NG with LLM has been applied to fuzzymodeling with TS fuzzy model, and its effectiveness has beendemonstrated. However, the effectiveness of this approach hasnot been been confirmed for simpler fuzzy model such assimplified fuzzy model. In this paper, the method using NGwith LLM to fuzzy modeling with simplified fuzzy inferencemodel is proposed and the similar method using k-meansinstead of NG is also proposed. In the proposed methods, theinitial parameters of fuzzy rules including the weights in theconsequent part by using NG (k-means) are firstly defined,and then the parameters by using SDM are adjusted. Theeffectiveness of the proposed methods with simplified fuzzymodeling is demonstrated in numerical simulations of functionapproximation and classification problems.

Index Terms—Simplified Fuzzy Inference Model, VectorQuantization, Neural Gas, k-means, Local Linear Mapping

I. INTRODUCTION

W ITH increasing interest in Artificial Intelligence (AI),extensive studies related to Machine Learning (ML)have been done. In the field of ML, supervised learningsuch as backpropagation, unsupervised one such as K-means,and Reinforcement Learning (RL) are well known. MLfor fuzzy inference models, called fuzzy modeling, is oneof the supervised ones, and extensive studies on fuzzymodeling have also been made [1], [2]. Although mostof fuzzy modeling methods are based on steepest descentmethod (SDM), their obvious drawbacks are its large timecomplexity, getting stuck easily in shallow local minima anddifficulty of applying to high dimensional data. In order toovercome the drawbacks, the following has been developed:1) creating fuzzy rules one by one starting from a fewrules, or deleting fuzzy rules one by one starting frommany rules [3], 2) using evolutionary algorithms such asGA and PSO to determine parameters of fuzzy inferencesystems [4], 3) fuzzy inference systems composed of rule

Hirofumi Miyajima is Shiool of Information and Data Sciences, NagasakiUniversity, 1-14, Bunkyomachi, Nagasaki city, Nagasaki, Japan e-mail:[email protected]

Noritaka Shigei is with Kagoshima University, email:[email protected]

Hiromi Miyajima is with Former Kagoshima University, email:[email protected]

modules with a few inputs such as SIRMs (Single InputRule Modules) and DIRMs (Double Input Rule Modules)methods [5], and 4) determining the initial assignment oflearning parameters by using self-organization map (SOM)or VQ [6], [7]. Especially, with fuzzy modeling using VQmethods such as k-means, Neural-Gas (NG), SOM and fuzzyc-means, there are many studies on how to realize the systemwith high accuracy and a few rules [8], [9]. In the methods,the initial parameters of the antecedent part of rules by VQusing only input or both input and output of learning dataare firstly determined, and then all the parameters by usingSDM and adjusted. Further, in addition to initial assignmentsof the antecedent part, it also has been proposed to determinethe initial assignment of weight parameters of the consequentpart by using VQ [8], [9].

Neural Gas (NG) is known as a novel approach of VQ andNG with local linear mapping (LLM) has been applied to atime-series prediction problem [10]. In the application, thepredicted value in each of subregions is approximated by us-ing a corresponding linear mapping. It has been demonstratedthat, compared with k-means with RBF, NG with LLM isadvantageous in terms of the number of subregions, eachof which is approximated by using a linear mapping. Thesimilar idea has been applied to fuzzy modeling with TSfuzzy model, and its effectiveness has been demonstrated[11]. However, the effectiveness of the similar approach hasnot been been confirmed for simpler fuzzy model such assimplified fuzzy modeling [12], [14].

In this paper, the concept of Vector Quantization such asNG and k-means with LLM is applied to fuzzy modelingwith simplified fuzzy inference model. Before adjustingthe parameters by using SDM, the initial parameters offuzzy rules including the weights in the consequent partare determined by using NG (k-means) and supervisedlearning in the proposed methods. The determination of theinitial parameters of the consequent part firstly divides theinput space into Voronoi subregions, and then determines aconstant value well approximating the output value of eachsubregion by using supervised learning. The effectivenessof the proposed method with simplified fuzzy modelingis demonstrated in numerical simulations of function ap-proximation and classification problems. In Section II, theconventional learning methods of fuzzy model, NG, k-meansand LLM are introduced. In Section III, a learning methodfor the simplified fuzzy models using NG and k-means areproposed. In Section IV, numerical simulations for functionapproximation and classification problems are performed todemonstrate the effectiveness of the proposed methods.

IAENG International Journal of Computer Science, 47:4, IJCS_47_4_27

Volume 47, Issue 4: December 2020

______________________________________________________________________________________

II. PRELIMINARIES

A. The simplified fuzzy inference models

The conventional (simplified) fuzzy inference model usingSDM is described [1]. Let Zj = {1, · · · , j} and Z∗j ={0, 1, · · ·, j} for the positive integer j. Let R be the set of realnumbers. Let x = (x1, · · · , xm) and yr be input and outputdata, respectively, where xi∈R for i ∈ Zm and yr∈R. Thenthe rule of fuzzy inference model is expressed as

Ri : if x1 is Mi1 and· · ·xj is Mij · · · and xm is Mimthen y is wi, (1)

where i ∈ Zn is a rule number, j is a variable number,Mij is a membership function of the antecedent part, and wiis the weight of the consequent part.

A membership value µi of the antecedent part for input xis expressed as

µi =m∏j=1

Mij(xj). (2)

The output y∗ of fuzzy inference is obtained as follows:

y∗ =

∑ni=1 µiwi∑ni=1 µi

(3)

If Gaussian membership function is used, then Mij isexpressed as follow:

Mij(xj) = exp

(−12

(xj − cij

bij

)2)(4)

where cij and bij denote the center and the width values ofMij , respectively.

In order to realize the effective model, conventional learn-ing is introduced.

Let D = {(xp1, · · · , xpm, yrp)|p ∈ ZP } be the set oflearning data. The objective of learning is to minimize thefollowing mean square error (MSE). The objective functionE is determined to evaluate the inference error between thedesirable output yr and the inference output y∗ as follows :

E =1

P

P∑p=1

(y∗p − yrp)2. (5)

In order to minimize the objective function E, eachparameter α ∈ {cij , bij , wi} is updated based on SDM(Steepest Descent Method) as follows [1]:

α(t+ 1) = α(t)−Kα∂E

∂α(6)

where t is iteration time and Kα is a constant. When theGaussian membership function is used as the membershipfunction, the following relation holds.

∂E

∂wi=

µi∑ni=1 µi

(y∗ − yr) (7)

∂E

∂cij=

µi∑ni=1 µi

(y∗ − yr)(wi − y∗)xj − cij

b2ij(8)

∂E

∂bij=

µi∑ni=1 µi

(y∗ − yr)(wi − y∗)(xj − cij)2

b3ij(9)

where i∈Zn and j∈Zm.The conventional learning algorithm is shown as follows

[1], where θ and Tmax are the threshold and the maximum

number of learning, respectively. Note that the method is agenerative one, which creates fuzzy rules one by one startingfrom any number of rules. The method is called learningalgorithm A.Learning Algorithm AInput : Learning data D = {(xp1, · · ·, xpm, yrp)|p∈ZP }Output : Parameters c, b and wStep A1 : The initial number n of rules is set to n0. Lett = 1.Step A2 : The parameters cij , bij and wi are set randomly.Step A3 : Let p = 1.Step A4 : A data (xp1, · · ·, xpm, yrp)∈D is given.Step A5 : From Eqs.(2) and (3), µi and y∗ are computed,respectively.Step A6 : Parameters wi, cij and bij are updated by Eqs.(7),(8) and (9), respectively.Step A7 : If p = P , then go to Step A8 and if p < P , thengo to Step A4 with p←p+ 1.Step A8 : Let E(t) be inference error at step t calculatedby Eq.(5). If E(t) > θ and t < Tmax, then go to Step A3with t←t+1 else if E(t)≤θ, then the algorithm terminates.Step A9 : If t > Tmax and E(t) > θ, then go to Step A2with n←n+ 1 and t = 1.

B. Neural gas and K-means

Vector quantization techniques encode a data space, e.g., asubspace V⊆Rd, utilizing only a finite set U = {ui|i∈Zr}of reference vectors (also called cluster centers), where d andr are positive integers.

Let the winner vector ui(v) be defined for any vector v∈Vas follows:

i(v) = arg mini∈Zr||v − ui|| (10)

From the finite set U , V is partitioned as follows:

Vi = {v∈V |||v − ui||≤||v − uj || for j∈Zr} (11)

For NG [10], the following method is used:Given an input data vector v, we determine the

neighborhood-ranking uik for k∈Z∗r−1, being the referencevector for which there are k vectors uj with

||v − uj || < ||v − uik || (12)

If we denote the number k associated with each vector uiby ki(v,ui), then the adaption step for adjusting the ui’s isgiven by

△ui = εhλ(ki(v,ui))(v − ui) (13)hλ(ki(v,ui)) = exp (−ki(v,ui)/λ)) (14)

ε = εint

(εfinεint

) tTmax

where ε∈[0, 1] and λ > 0. The number λ is called decayconstant.

The evaluation function for the partition is defined asfollows:

E =∑ui∈U

∑v∈V

hλ(ki(v,ui))∑ul∈U hλ(kl(v,ul))

||v − ui(v)||2 (15)

If λ→0, Eq.(13) becomes equivalent to the K-means [10].



______________________________________________________________________________________

In k-means, only the winner ui0 is updated in learning.In NG, not only the winner ui0 but the second, third nearestreference vector ui1 , ui2 , · · · are also updated in learning.

Let D∗ = {xp|p∈ZP } for the set D. Let p(v) be theprobability distribution for v∈V . Then, NG is introduced asfollows [10]:Learning Algorithm B∗ (Neural Gas)Input : The set V of dataOutput : The set U of reference vectorsStep B∗1 : The initial values of reference vectors are setrandomly. The learning coefficients εint and εfin are set,respectively. Let Tmax and θ be the maximum number oflearning time and the threshold, respectively.Step B∗2 : Let t = 1.Step B∗3 : Give a data v∈V based on p(x) andneighborhood-ranking ki(v,ui) is determined.Step B∗4 : Each reference vector ui for i∈Zr is updatedbased on Eq.(13)Step B∗5 : If t≥Tmax, then the algorithm terminates andthe set U = {ui|i∈Zr} of reference vectors is obtained.Otherwise go to Step B∗3 as t←t+ 1.

Likewise, k-means method is introduced.

If the data distribution p(v) is not given in advance, astochastic sequence of input data v(1),v(2), · · · which isbased on p(v) is given.

By using Learning Algorithm B∗, the learning method ofthe fuzzy system is introduced as follows [8]. In this case,assume that the distribution of the set D∗ of learning data isa discrete uniform one. Let n0 be the initial number of rulesand n = n0.Learning Algorithm B (Fuzzy modeling of using NG)Input : Learning data D∗ = {(xp1, · · ·, xpm)|p∈ZP }Output : Parameters c, b and wStep B1 : For learning data D∗, Learning Algorithm B∗is performed and the set U of reference vectors is obtained,where |U | = n.Step B2 : Each element of the set U is set to the centerparameter c of each fuzzy rule. Further, the width parameterbij is set as follows :

bij =1

mi

∑xk∈Ci

(cij − xkj)2, (16)

where Ci and mi are the set of learning data belonging tothe i-th cluster and its cardinal number, respectively. Eachinitial value of wi is selected randomly. Let t = 1.Step B3 : The Steps A3 to A8 of learning algorithm A areperformed.Step B4 : If t≥Tmax and E(t) > θ, then go to Step B1with n←n+ 1.

Likewise, fuzzy modeling using k-means is proposed.

C. Adaptive local linear mapping

In this section, the learning method to adaptively approx-imate the function y = f(v) with v∈V⊆Rd and y∈Rusing NG is introduced based on Ref. [10]. The set Vdenotes the function’s domain. Let n be the number ofcomputational units, each containing a reference vector ui

together with a constant ai0 and d-dimensional vectors ai.Learning Algorithm B∗ assigns each unit i to a subregion Vias defined in Eq.(11), and the coefficients ai0 and ai definea linear mapping

g(v) = ai0 + ai(v − ui) (17)

from Rd to R over each of the Voronoi diagram Vi. Hence,the function y = f(v) is approximated by ỹ = g(v) with

g(v) = ai(v)0 + ai(v)(v − ui(v)) (18)

where i(v) denotes unit i with its ui closest to v.To learn the input-output mapping, a series of learning

steps by approximating D = {(xp, yrp)|p∈ZP } is performed.In order to obtain the output coefficients ai0 and ai, themean squared error 1|V |

∑v∈V (f(v)− g(v))

2 between thedesirable and the obtained output, averaged over subregionV i, to be minimal is required for each i. Gradient descentwith respect to ai0 and ai yields [10].

△ai0 = ε′hλ′(ki(v,ui))(y − ai0 − ai(v − ui)) (19)△ai = ε′hλ′(ki(v,ui))(y − ai0 − ai(v − ui))(v − ui)

(20)

where ε′ > 0 and λ′ > 0.In order to apply the method with adaptively local linear

mapping (LLM) to fuzzy modeling of the simplified fuzzysystem, all parameters ai’s are not used. That is, the con-stants ai0’s only are updated.

The algorithm is introduced as follows :Learning Algorithm CInput : Learning data D = {(xp, yp)|p∈ZP } and D∗ ={xp|p∈ZP }.Output : The set U of reference vectors and the coefficientai0 for i∈Zn.Step C1 : The set U of reference vectors is determinedusing D∗ by Algorithm B∗. The subregions Vi for i∈Zn isdetermined using U , where Vi is defined by Eq.(11), V d =∪ni=1Vi and Vi∩Vj = ∅ (i̸=j).Step C2 : Each parameter ai0(i∈Zn) is set randomly. Lett = 1.Step C3 : A learning data (x, y)∈D is selected based onp(x). The rank ki(x,ui) of x for the set Vi is determined.Step C4 : Each parameter ai0 for i∈Zn is updated basedon the following Eq.(21).

ai0(t+ 1) = ai0(t) + ε′hλ(ki(v,ui))(y − ai0) (21)

Step C5 : If t≥Tmax, then the algorithm terminates else goto Step C3 with t←t+1, where Tmax means the maximumnumber of learning times.

Remark that Algorithm C is one of the learning methodsusing NG and SDM [10]. Likewise, the learning methods us-ing other VQ such as k-means and SDM are also considered.

D. The appearance probability of learning data based onthe rate of change of output

Learning Algorithm B is a method that determines theinitial assignment of fuzzy rules by vector quantization usingthe set D∗. In the previous paper, we proposed a methodusing both input and output parts of D to determine the



______________________________________________________________________________________

initial assignment of parameters of the antecedent part offuzzy rule [8].

Based on Ref. [8], the appearance probability is definedas follows :Algorithm for Appearance Probability (Algorithm AP)Input : Learning data D = {(xp1, · · · , xpm, yrp)|p ∈ ZP } andD∗ = {(xp1, · · ·, xpm)|p∈ZP }Output : Appearance probability pM (x) for x∈D∗Step 1 : Give an input data xi∈D∗, we determine theneighborhood-ranking (xi0 ,xi1 , · · ·,xik , · · ·,xiP−1) of thevector xi with xi0 = xi, xi1 being closest to xi andxik (k = 0, · · ·, P − 1) being the vector xi for which thereare k vectors xj with ||xi − xj || < ||xi − xik ||.Step 2 : Determine H(xi) which shows the degree of changeof inclination of the output around output data to input dataxi, by the following equation:

H(xi) =M∑l=1

|yi − yil |||xi − xil ||

(22)

where xil for l∈ZM means the l-th neighborhood-rankingof xi, i∈ZP and yi and yil are output for input xi and xil ,respectively. The number M means the range consideringH(x).Step 3 : Determine the appearance probability pM (xi) forxi by normalizing H(xi).

pM (xi) =

H(xi)∑Pj=1 H(x

j)(23)

The method is called Algorithm AP.

[Example 1]Let us explain the computation about pM (x) in the case ofy = x2−2x+1. The set of D = {(0.1×k, 0.01×k2−0.2×+1)|k ∈ Z∗10} and D∗ = {0.1×k|k ∈ Z∗10} are learning data.The gradient of y = x2 − 2x + 1 is large around x = 0and small around x = 1. If the gradient of the functionis large in an area, it is desired to be set many rules. Toachieve it, the probability pM (x) is desired to be large. If theprobability pM (x) is large, the learning data x∈D is chosenwith high frequency. That is, if the gradient of the function islarge around the learning data x∈D, the probability pM (x)is desired to be large to chose the learning data x with highfrequency. Let us show an example when M = 3. H(x1)for the data (x1, y1) = (0, 1) is calculated by the followingcalculation.

H(x1) =|1− 0.81||0− 0.1|

+|1− 0.64||0− 0.2|

+|1− 0.49||0− 0.3|

= 5.4

For the learning data (x2, y2) = (0.1, 0.81), (x3, y3) =(0.2, 0.64), · · ·, the values H(x2) = 5.2, H(x3) = 5, · · ·are calculated by the same calculation. By using Eq.(23),p3(x1) = 0.16, p3(x2) = 0.15, p3(x3) = 0.15, · · · areobtained(shown in Table I). As shown in Table I, H(x) islargest when x = 0 and smallest when x = 1. □

Learning algorithm F using Algorithm AP to fuzzy mod-eling is obtained as follows:Learning Algorithm FInput : Learning data D = {(xp1, · · · , xpm, yrp)|p ∈ ZP } andD∗ = {(xp1, · · ·, xpm)|p∈ZP }

(a) Positions of reference vectors when the value M issmall

(b) Positions of reference vectors when the value M is large

Fig. 1. Positions of reference vectors for the function y = x2 − 2x+ 1

Output : Parameters c, b and w.Step F1 : The constants θ, T 0max, Tmax and M0 for 1≤M0are set. Let M = M0. The probability pM (x) for x∈D∗ iscomputed using Algorithm AP. The initial number n of rulesis set.Step F2 : The initial values of cij , bij and wi are setrandomly. Let t = 1.Step F3 : Select a data x∈D∗ based on pM (x).Step F4 : Update cij by Eq.(14) to approximate the set D∗by the set of center parameters cij’s of fuzzy rule.Step F5 : If t < T 0max, go to Step F3 with t←t + 1,otherwise go to Step F6 with t←1.Step F6 : Determine bij by Eq.(16).Step F7 : Let p = 1.Step F8 : Given data (xp, yrp)∈D.Step F9 : Calculate µi and y∗ by Eqs.(2) and (4).Step F10 : Update parameters wi, cij and bij by Eqs.(7),(8) and (9).Step F11 : If p < P then go to Step F8 with p←p+ 1.Step F12 : If E > θ and t < Tmax then go to Step F8with t←t+1, where E is computed as Eq.(5), and if E < θthen the algorithm terminate, otherwise go to Step F2 withn←n+ 1.

[Example 2]Fig. 1 shows about examples of reference vectors determinedby using pM (x). The set of learning data is same as Example1. Initial positions of reference vectors are set randomly. Ifthe value M is small, most of output data y is not sufficientlyconsidered and after learning, reference vectors are set withequal intervals as shown in Fig. 1(a). If the value M islarge,some reference vectors are set in the area where thegradient of the function is large as shown Fig. 1(b). Thevalue M must be chosen appropriately for each problem.

□

In Algorithm F, the SDM processes of F7 to F12 are



______________________________________________________________________________________

performed after the initial values of parameters c and b ofF4 to F6 are set.

As shown in Ref. [8], learning algorithm F realizes thatmany rules are needed at or near the places where outputchanges rapidly in learning data. The probability pM (x)is one method to perform it [8]. See the Ref. [8] aboutthe detailed explanation of Algorithms AP and F for thesimplified fuzzy modeling. Learning Algorithm F is themethod based on NG. Likewise, Learning Algorithm F isalso proposed based on k-means.

III. THE PROPOSED FUZZY MODELING USING NG ANDK-MEANS

It is known that the assignment of the initial parameters byVQ can improve the performance of fuzzy modeling based onSDM in terms of accuracy and the number of rules. For TSfuzzy inference model, which is superior to the simplifiedfuzzy inference model in terms of accuracy, it has beenalready proposed to assign the initial values of the weightparameters in the consequent part of fuzzy rules by using NGand its effectiveness has been demonstrated [11]. However,for simplified fuzzy inference model, a similar approach hasnot been proposed, and most of the conventional initializationmethods using VQ take into account only the parameters ofthe antecedent parts of fuzzy rules. Therefore, for simplifiedfuzzy inference model, a new learning method using theLearning Algorithm C is proposed as follows:Learning Algorithm GInput : Learning data D = {(xp, yp)| p∈ZP } and D∗ ={xp|p∈ZP }.Output : Parameters c, b and w.Step G1 : Let θ, Tmax1, Tmax2 and Tmax3 be the thresholdof inference error, the maximum numbers of learning timefor NG and the maximum number of learning time for SDM,respectively.Step G2 : From Algorithm AP, the probability pM (x) iscomputed using the set D of learning data.Step G3 : From Algorithm B∗, the set U of reference vectorsis computed using pM (x).Step G4 : From Algorithm C, parameters ai0’s of LLM arecomputed using the set U .Step G5 : Each element of the set U is set to the centerparameter c of each fuzzy rule and the width b of each fuzzyrule is computed from Eq.(16). Each parameter ai0 of LLMis set to the initial weight wi of fuzzy rule. Let t = 1.Step G6 : Steps A3 to A7 of Algorithm A are performedfor c, b and w.Step G7 : Inference error E(t) of Eq.(5) is calculated. IfE(t)≤θ or t = Tmax3 then go to Step G8. If E(t) > θ andt < Tmax3, then go to Step G6 with t←t+ 1.Step G8 : If E(t)≤θ, then the algorithm terminates. IfE(t) > θ, then go to Step G3 with n←n+ 1.

In Algorithm G, initial values of parameters c and b usingG2 and G3, and parameters w using G4 are considered,respectively. Further, the SDM processes of steps G5 to G7are performed.

Let us explain Algorithm G using an example.[Example 3]

Fig. 2. The figure to explain an Example.

TABLE ITHE APPEARANCE PROBABILITY p3(x) FOR EXAMPLE OF

y = x2 − 2x+ 1

The problem is to approximate the function y = x2−2x+1using Example 1.

From Steps G1 and G2, a probability pM (x) is computedas shown in Table I, where the number r of reference vectorsis 3. From Step G3, NG is performed based on pM (x). LetM = 3. For example, two reference vectors are assignedin the interval from 0 to 0.7, and one reference vector isassigned for the remaining interval. Three regions V1, V2 andV3 shown in Fig.2 are defined as Voronoi regions from theset U of reference vectors. From Step G4, a linear functionis defined in each region. In the example, interval linearfunctions (constants) as shown in Fig.2 are obtained as theresult of Algorithm C. If Algorithm C is not used, eachlinear function (constant) is given randomly as shown inFig.2. From Step G5, the initial parameters of fuzzy rulesare determined, respectively.

At Step G6, the conventional SDM is performed andparameters c, b and w are updated. If sufficient accuracycannot be obtained, a fuzzy rule is adaptively added at StepG8. □

Learning Algorithm G is the method using NG as VQ.Likewise, Learning Algorithm is also proposed based on k-means.

IV. NUMERICAL SIMULATIONS

In order to show the effectiveness of the proposed meth-ods, numerical simulations of function approximation andclassification problems are performed.

A. The conventional and proposed methods

In the following, the conventional algorithms are (a), (b),(c) and (d) and the proposed algorithm is (c’). The methods(a), (b), (c) and (c’) are for simplified fuzzy inference modeland (d) for TS fuzzy inference model [11]. Four the methods(b), (c) and (c’), learning algorithms based on k-means are



______________________________________________________________________________________

also introduced. They are called (b-1), (c-1) and (c’-1),respectively.

(a) The method (a) is Learning Algorithm A, which is theprimitive learning method for the simplified fuzzy system[1]. Initial parameters are set randomly and all parametersare updated using SDM until the inference error becomessufficiently small.(b) The method (b) is generalized one of learning methodof RBF network [1]. The initial values of center parametersare determined using learning data D∗ by NG and the widthparameters are computed using the center parameters fromEq.16 [8]. The initial parameters of the weight are randomlyselected. Further, all parameters are updated using SDM untilthe inference error becomes sufficiently small (See Fig.3(b)).(c) The method (c) is Learning Algorithm F. Initial param-eters of the center are determined using learning data D byNG and the initial parameters of the width are computedusing the center parameters from Eq.16 [8]. The initial as-signment of weight parameters is randomly selected. Further,all parameters are updated using SDM until the inferenceerror becomes sufficiently small (See Fig.3(c)).(c’) The method (c’) is Learning Algorithm G. In addition tothe initial assignment of the method (c), the initial assignmentof weight parameters of the consequent part is determined byLearning Algorithm C. Further, all parameters are updatedusing SDM until the inference error becomes sufficientlysmall (See Fig.3(c’)).(d) The method (d) is Algorithm E in Ref. [11] and is theversion of TS fuzzy model of (c’). The simulation results arepresented only for pattern classification problems.

Likewise, the methods (b-1), (c-1) and (c’-1) are proposedusing k-means instead of NG.

B. Function approximation

The systems are identified by fuzzy inference systems.This simulation uses three systems specified by the followingfunctions with 2 and 4-dimensional input space [0, 1]2 forEq.(24) and [−1, 1]4 for Eqs.(25) and (26), respectively, andone output with the range [0, 1]. Fig.4 shows the figurefor Eq.(24). The numbers of Learning and Test data are1000 and 1000 selected from the uniform random numbers,respectively.

y =sin(10(x1 − 0.5)2 + 10(x2 − 0.5)2) + 1

2(24)

y =(2x1 + 4x

22 + 0.1)

2

74.42

+(3e3x3 + 2e−4x4)−0.5 − 0.077

4.68(25)

y =(2x1 + 4x

22 + 0.1)

2

74.42

× (4 sin(πx3) + 2 cos(πx4) + 6)446.52

(26)

The constants θ, Tmax, Kcij , Kbij and Kwi for each algo-rithm are 1.0×10−4, 50000, 0.01, 0.01 and 0.1, respectively.The constants εinit, εfin and λ for methods (b), (c) and (c’)are 0.1, 0.01 and 0.7, respectively. The constants ε′init, ε

′fin

and λ′ for method (c’) are 0.1, 0.01 and 0.7, respectively.On methods (c) and (c’), the number M for the probability

TABLE IITHE RESULT FOR FUNCTION APPROXIMATION FOR THE METHOD OF

USING NG.

Method Eq.(24) Eq.(25) Eq.(26)the number of rules 5.00 11.60 6.20

(a) MSE for Learning(×10−4) 0.34 0.07 0.03MSE of Test(×10−4) 0.35 0.48 0.44the number of rules 3.90 8.55 4.80

(b) MSE of Learning(×10−4) 0.23 0.07 0.03MSE of Test(×10−4) 0.27 0.09 0.03the number of rules 3.75 7.75 4.65

(c) MSE of Learning(×10−4) 0.22 0.07 0.03MSE of Test(×10−4) 0.23 0.08 0.03the number of rules 3.50 7.60 4.70

(c’) MSE of Learning(×10−4) 0.26 0.07 0.03MSE of Test(×10−4) 0.27 0.09 0.03

pM (x) is 500 for Eqs.(24) and (26) and 300 for Eq.(25),respectively. Fig.4 shows the figure for Eq.(24)

In Table II, the number of rules and MSE’s for learning andtest are shown, where the number of rules means one whenthe threshold θ = 1.0×10−4 of inference error is achieved inlearning. The result of simulation is the average value fromtwenty trials. As a result, proposed method (c’) reduces thenumber of rules compared to conventional methods whilekeeping the accuracy.

C. Classification problems

Iris, Wine, BCW and Sonar data from the UCI databaseare used for numerical simulation [13]. The numbers of dataare 150, 178, 683 and 208, the numbers of input are 4,13, 9 and 60 and the numbers of classes are 3, 3, 2 and2, respectively. In this simulation, 5-fold cross-validation isused as the evaluation method. Threshold θ is 0.01 on Iris,0.001 on Wine and 0.02 on BCW and Sonar, respectively.The constants Tmax, Kcij , Kbij and Kwi for each methodare 50000, 0.01, 0.01 and 0.1, respectively. The constantsεinit, εfin and λ for methods (b), (c), and (c’) are 0.1, 0.01and 0.7, respectively. The constants ε′init, ε

′fin and λ

′ formethods (c’) and (d) are 0.1, 0.01 and 0.7, respectively. Onmethods (c), (c’) and (d), the number M for the probabilitypM (x) is 100 for Iris, Wine and Sonar and 200 for BCW,respectively.

Table III shows the result of classification for each method,where the number of rules means one when the threshold θof inference error is achieved in learning. In Table III, thenumber of rules and RM’s for learning and test are shown,where RM means the rate of misclassification. The resultof simulation is the average value from twenty trials. It isshown that, among all the results, the proposed method (c’)achieves the best or comparable performance in terms ofboth of accuracy of the number of rules. It should be notedthat the proposed method (c’) outperforms the conventionalmethod (d) using TS fuzzy model for most cases.

D. Numerical simulations for the proposed method using k-means

Similar simulations are performed for the algorithm withk-means instead of NG. The parameters used for learning arethe same except for λ, and the number of experiments is alsothe same. k-means is used for (b), (c), and (c’) except (a).Table IV shows the results for the function approximation of



______________________________________________________________________________________

Fig. 3. Four charts of conventional and proposed algorithms : SDM and NG mean Steepest Descent Method and Neural Gas. NG (k-means) meansto use NG or k-means. The feedback loop (the under arrow) means adding of a fuzzy rule, where D = {(xp1, · · · , x

pm, y

rp)|p ∈ ZP } and D∗ =

{(xp1, · · ·, xpm)|p∈ZP }.

Fig. 4. The figure for the function y = sin(10(x1−0.5)2+10(x2−0.5)2)+12

of Eq.(24).

TABLE IIITHE RESULT FOR PATTERN CLASSIFICATION FOR THE METHOD OF USING

NG.

Method Iris Wine BCW Sonarthe number of rules 2.34 4.14 2.60 7.11

(a) RM for Learning(%) 3.54 0.27 2.21 0.99RM of Test(%) 4.90 4.47 3.67 18.07

the number of rules 2.15 3.34 2.36 4.91(b) RM of Learning(%) 3.42 0.30 2.23 0.67

RM of Test(%) 4.40 3.33 3.66 17.26the number of rules 2.10 3.26 2.05 2.61

(c) RM of Learning(%) 3.48 0.26 2.19 1.73RM of Test(%) 4.50 3.31 3.43 17.12

the number of rules 2.00 3.00 2.00 2.41(c’) RM of Learning(%) 3.50 0.35 2.19 1.81


(d) RM of Learning(%) 2.57 1.46 2.46 0.14RM of Test(%) 4.33 4.42 3.87 23.45

TABLE IVTHE RESULT FOR FUNCTION APPROXIMATION FOR THE METHOD OF

USING K-MEANS.

Method Eq.(24) Eq.(25) Eq.(26)the number of rules 5.00 11.60 6.20

(a) MSE for Learning(×10−4) 0.34 0.07 0.03MSE of Test(×10−4) 0.35 0.48 0.44the number of rules 4.30 8.45 6.00

(b-1) MSE of Learning(×10−4) 0.33 0.07 0.03MSE of Test(×10−4) 0.49 0.10 0.04the number of rules 3.85 8.20 5.60

(c-1) MSE of Learning(×10−4) 0.28 0.07 0.04MSE of Test(×10−4) 0.29 0.09 0.05the number of rules 3.90 8.30 5.90

(c’-1) MSE of Learning(×10−4) 0.22 0.08 0.03MSE of Test(×10−4) 0.25 0.10 0.04

TABLE VTHE RESULT FOR PATTERN CLASSIFICATION FOR THE METHOD OF USING

K-MEANS.

Method Iris Wine BCW Sonarthe number of rules 2.34 4.14 2.60 7.11

(a) RM for Learning(%) 3.54 0.27 2.21 0.99RM of Test(%) 4.90 4.47 3.67 18.07

the number of rules 3.36 3.51 2.53 2.19(b-1) RM of Learning(%) 4.1 0.3 1.7 2.1


(c-1) RM of Learning(%) 4.2 0.2 1.8 2.1RM of Test(%) 5.0 3.3 17.4 3.4

the number of rules 3.07 3.12 2.51 2.04(c’-1) RM of Learning(%) 4.1 0.2 1.8 2.1

RM of Test(%) 4.9 2.7 17.2 3.5

Eqs. (24), (25) and (26). Although (c) and (c’) are superiorto (a) and (b), there is no difference in ability between (c)and (c’).

Similarly, Table V shows the results for the classificationproblem. In this case, the parameters used for learning arealso the same except for λ, and the number of experimentsis the same. In this case, (c) and (c’) are superior to (a) and(b), but there is no difference in ability between (c) and (c’).

Furthermore, let us consider the difference of performance



______________________________________________________________________________________

when NG and k-means are used. Learning algorithm usingNG is superior in the accuracy and the number of rules to themethod using k-means, but the latter is superior in learningtime to the former.

These should be selected according to the purpose.

V. CONCLUSION

In this paper, new learning methods of fuzzy modelingusing NG and k-means with supervised learning (LLM) wereproposed for simplified fuzzy inference model. Conventionallearning methods using NG and k-means were to performinitial assignment only for antecedent part of fuzzy rules.In order to assign the initial values of weight parametersin the consequent part of fuzzy rules, the proposed methodfirstly divided the input space into Voronoi subregion byusing NG (or k-means) and then assigned a constant toeach subregion by using supervised learning (LLM). Thesimulation results on function approximation and patternclassification showed that the proposed method using NGachieved the best or comparable performance in terms ofboth of accuracy and the number of rules. Notably, thesimulation results of pattern classification showed that theproposed method using NG for simplified fuzzy model wasmore effective than the similar approach for TS fuzzy model.From the result, it is considered that simpler model is suitablefor this approach. Further, the proposed method was alsosuperior to the method using generalized inverse matrix todetermine the initial assignment of weight parameters [14].Furthermore, the proposed method using NG was superior inaccuracy and the number of rules to one using k-means. Infuture work, further improvement of fuzzy modeling usingVQ will be considered and applied to other problems.

REFERENCES

[1] M.M. Gupta, L. Jin and N. Homma, ”Static and Dynamic NeuralNetworks,” IEEE Press, 2003.

[2] J. Casillas, O. Cordon, F. Herrera and L. Magdalena, ”AccuracyImprovements in Linguistic Fuzzy Modeling,” Studies in Fuzziness andSoft Computing, Vol. 129, Springer, 2003.

[3] S. Fukumoto, H. Miyajima, K. Kishida and Y. Nagasawa, ”A Destruc-tive Learning Method of Fuzzy Inference Rules,” Proc. of IEEE onFuzzy Systems, 1995, pp.687-694.

[4] O. Cordon, ”A historical review of evolutionary learning methodsfor Mamdani-type fuzzy rule-based systems, Designing interpretablegenetic fuzzy systems,” Journal of Approximate Reasoning, 52, 2011,pp.894-913.

[5] H. Miyajima, N. Shigei and H. Miyajima, ”Fuzzy Inference SystemsComposed of Double-Input Rule Modules for Obstacle AvoidanceProblems,” IAENG International Journal of Computer Science, Vol. 41,Issue 4, 2014, pp.222-230.

[6] K. Kishida and H. Miyajima, ”A Learning Method of Fuzzy InferenceRules using Vector Quantization,” Proc. of the Int. Conf. on ArtificialNeural Networks, Vol.2, 1998, pp.827-832.

[7] S. Fukumoto, H. Miyajima, N. Shigei and K. Uchikoba, ”A DecisionProcedure of the Initial Values of Fuzzy Inference System UsingCounterpropagation Networks,” Journal of Signal Processing, Vol.9,No.4, 2005, pp.335-342.

[8] H. Miyajima, N. Shigei and H. Miyajima, ”Fuzzy Modeling usingVector Quantization based on Input and Output Learning Data,” LectureNotes in Engineering and Computer Science: Proceedings of TheInternational MultiConference of Engineers and Computer Scientists2017, 15-17 March, 2017, Hong Kong, pp.1-6.

[9] W. Pedrycz, H. Izakian, ”Cluster-Centric Fuzzy Modeling,” IEEE Trans.on Fuzzy Systems, Vol. 22, Issue 6, 2014, pp. 1585-1597.

[10] T. M. Martinetz, S. G. Berkovich and K. J. Schulten, ”Neural GasNetwork for Vector Quantization and its Application to Time-seriesPrediction,” IEEE Trans. Neural Network, 4, 4, 1993, pp.558-569.

[11] H. Miyajima, N. Shigei and H. Miyajima, ”Fuzzy Modeling usingVector Quantization with Supervised Learning,” Lecture Notes in En-gineering and Computer Science: Proceedings of The InternationalMultiConference of Engineers and Computer Scientists 2018, 14-16March, 2018, Hong Kong, pp.17-22.

[12] H. Miyajima, N. Shigei and H. Miyajima, ”Fuzzy Modeling usingNeural Gas, World Congress on Engineering and Computer Science2019,” Lecture Notes in Engineering and Computer Science: Proceed-ings of The World Congress on Engineering and Computer Science2019, 22-24 October, 2019, San Francisco, USA, pp409-414.

[13] UCI Repository of Machine LearningDatabases:,https://archive.ics.uci.edu/ml/index.php (Mar.20.2019).

[14] H. Miyajima, N. Shigei and H. Miyajima, ”Learning Algorithms forFuzzy Inference Systems using Vector Quantization,” Intech Open, DOI:10.5772/intechopen.79925, 2018.



______________________________________________________________________________________

Documents

Fuzzy Modeling using Vector Quantization and Local Linear ...simpliﬁed fuzzy modeling [12], [14]. In this paper, the concept of Vector Quantization such as NG and k-means with LLM