11
Reliable AI through SVDD and rule extraction Reliable AI through SVDD and rule extraction This paper was downloaded from TechRxiv (https://www.techrxiv.org). LICENSE CC BY 4.0 SUBMISSION DATE / POSTED DATE 19-05-2021 / 02-06-2021 CITATION Carlevaro, Alberto (2021): Reliable AI through SVDD and rule extraction. TechRxiv. Preprint. https://doi.org/10.36227/techrxiv.14618088.v1 DOI 10.36227/techrxiv.14618088.v1

Reliable AI through SVDD and rule extraction

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Reliable AI through SVDD and rule extraction

Reliable AI through SVDD and rule extractionReliable AI through SVDD and rule extractionThis paper was downloaded from TechRxiv (https://www.techrxiv.org).

LICENSE

CC BY 4.0

SUBMISSION DATE / POSTED DATE

19-05-2021 / 02-06-2021

CITATION

Carlevaro, Alberto (2021): Reliable AI through SVDD and rule extraction. TechRxiv. Preprint.https://doi.org/10.36227/techrxiv.14618088.v1

DOI

10.36227/techrxiv.14618088.v1

Page 2: Reliable AI through SVDD and rule extraction

JOURNAL OF INTELLIGENT SYSTEMS 1

Reliable AI through SVDD and rule extractionAlberto Carlevaro, DITEN-UNIGE, Maurizio Mongelli, IEIIT-CNR

Abstract—The proposed paper addresses how Support Vector Data Description (SVDD) can be used to detect safety regions withzero statistical error. It provides a detailed methodology for the applicability of SVDD in real-life applications, such as VehiclePlatooning, by addressing common machine learning problems such as parameter tuning and handling large data sets. Also, intelligibleanalytics for knowledge extraction with rules is presented: it is targeted to understand safety regions of system parameters. Results areshown by feeding data through simulation to the train of different rule extraction mechanisms.

Index Terms—SVDD, Safety regions, Explainable AI.

F

1 INTRODUCTION

THE study proposed in the paper follows the recent trenddedicated to identifying and handling assurance under

uncertainties in AI systems [24]. It falls in the category ofimproving reliability of prediction confidence. The topicremains a significant challenge in machine learning, aslearning algorithms proliferate into difficult real-world pat-tern recognition applications. The intrinsic statistical errorintroduced by any machine learning algorithm may leadto criticism by safety engineers. The topic has recieved agreat interest from industry [26], in particular in the auto-motive [28] and avionics [5] sectors. In this perspective, theconformal predictions framework [3] studies methodologiesto associate reliable measures of confidence with patternrecognition settings including classification, regression, andclustering. The proposed approach follows this direction,by identifying methods to circumvent data-driven safetyenvelopes with statistical zero errors. We show how thisassurance may limit considerably the size of the safetyenvelope (e.g., providing collision avoidance by drasticallyreducing speed of vehicles) and focus on how to find a goodbalance between the assurance and the safety space.We concentrated our work on a specific machine learn-ing methods, the Support Vector Data Description, whichby (its) definition is particularly suitable to define safetyenvelops (see Section 2). To it we have added intelligiblemodels for knowledge extraction with rules: intelligibilitymeans that the model is easily understandable, e.g. whenit is expressed by Boolean rules. Decision trees (DTs) aretypically used towards this aim. The comprehension ofneural network models (and of the largest part of the otherML techniques) reveals to be a hard task (see, e.g. Section4 of [14]). Together with DT, we use logic learning machine(LLM), which may show more versatility in rule generationand classification precision.

Our work takes a step forward in these areas due to

• A. Carlevaro is with the Department of Electrical, Electronics andTelecommunication Engineering and Naval Architecture (DITEN), Uni-versity of Genoa, Genoa, Italy.E-mail: [email protected]

• M. Mongelli is with the Institute of Electronics, Computer and Telecom-munication Engineering (IEIIT), Italian National Research Council(CNR).E-mail: [email protected]

• safety regions are tuned on the basis of the radius ofthe SVDD hypersphere

• simple rule extraction method from SVDD comparedwith LLM and DT

The article is organized as follows: first, a detailedintroduction of SVDD and Negative SVDD is introduced,also focusing on how to choose the best model parameters(Section 2.2) and how to handle large datasets (Section 2.3).Then Section 3 is devoted to rule extraction: LLM and DT arepresented and how to extract intelligible rules from SVDDis explained. Finally, an application example is proposed inSection 4.

2 SUPPORT VECTOR DATA DESCRIPTION

Characterizing a data set in a complete and exhaustive wayis an essential preliminary step for any action you wantto perform on it. Having a good description of a data setmeans being able to easily understand if a new observationcan contribute to the information brought by the rest of thedata or be totally irrelevant. The task of the data domaindescription is precisely to identify a region, a border, inwhich to enclose a certain type of information in the mostprecise possible way, i.e. not adding misinformation orempty spaces. This idea is realized mathematically by acircumference (a sphere, a hypersphere depending on thesize of the data space) that encloses as many points withas little area (volume) as possible. Indeed, SVDD can beused also to perform a classification of a specific class oftarget objects, i.e. it is possible to identify a region (a closedboundary) in which objects which should be rejected are notallowed.

This section is organized as follows: SVDD is introducedas in [29], focusing first on the normal description and thenon the description with negative examples [30]. Then wewill focus on two proposed algorithms for solving two prob-lems involving SVDD: fast training of large data sets [4] andautonomous detection of SVDD parameters [32]. Finally,the last subsection is devoted to two original methods forfinding zero False Negative Rate (FNR) regions with SVDD.

Page 3: Reliable AI through SVDD and rule extraction

2 JOURNAL OF INTELLIGENT SYSTEMS

(a) K(xi,xj) = xi · xjC = 0.05

(b) K(xi,xj) = (1 + xi · xj)dC = 0.05, d = 2

(c) K(xi,xj) = exp(− ||xi−xj ||2σ2 )

C = 0.05, σ = 0.8

Figure 1. SVDD with (a) linear kernel, (b)polynomial kernel, (c) gaussian kernel and the respective parameters. In red are plotted the SV (withαi < C) of the description.

2.1 Theory

Let {xi}, i = 1, . . . , N with xi ∈ Rd, d >= 1 , be a trainingset for which we want to obtain a description. We wantto find a sphere (a hypersphere) of radius R and center awith minimum volume, containing all (or most of) the dataobjects.

2.1.1 Normal Data DescriptionFor finding the decision boundary which captures the nor-mal instances and at the same time keeps the hypersphere’svolume at minimum, it is necessary to solve the followingoptimization problem [30]:

minR,a

F (R,a) = R2 s.t. ||xi − a||2 ≤ R2 ∀i (1)

But to allow the possibility of outliers in the training set,analogously to what happens for the soft-margin SVMs [1],slack variables ξi ≥ 0 are introduced and the minimizationproblem changes into [30]:

minR,a,ξi

F (R,a, ξi) = R2 + C∑i

ξi (2)

s.t.

{||xi − a||2 ≤ R2 + ξi,

ξi ≥ 0i = 1, . . . , N (3)

where the parameter C controls the influence of the slackvariables and thereby the trade-off between the volume andthe errors.

The optimisation problem is solved by incorporatingthe constraints (3) into equation (2) using the method ofLagrange for positive inequality constraints [10]:

L(R,a, αi, γi, ξi) = R2 + C∑i

ξi

−∑i

αi[R2 + ξi −

(||xi||2 − 2a · xi + ||a||2

)]−∑i

γiξi

(4)

with the Lagrange multipliers αi ≤ 0 and γi ≤ 0. Accordingto [29], L should be minimized with respect to R,a, ξi andmaximized with respect to αi and γi.

Setting partial derivatives of R,a and ξi to zero gives theconstraints [8]:

∂L

∂R= 0 :

∑i

αi = 1,∂L

∂a= 0 : a =

∑i

αixi (5)

∂L

∂ξi= 0 : C − αi − γi = 0⇒ 0 ≤ αi ≤ C (6)

and then, substituting (5) into (4) gives the dual problem of(2) and (3):

maxαi

L =∑i

αi(xi · xi)−∑i,j

αiαj(xi · xj) (7)

s.t

{∑i αi = 1,

0 ≤ αi ≤ C, i = 1, . . . , N(8)

Maximimizing (7) under (8) allows to determine all αiand then the parameters a and ξi can be deduced.

A training object xi and its corresponding αi satisfy oneof the following conditions [29], [30]:

||xi − a||2 < R2 ⇒ αi = 0 (9)

||xi − a||2 = R2 ⇒ 0 < αi < C (10)

||xi − a||2 > R2 ⇒ αi = C (11)

Since a is a linear combination of the objects with αi ascoefficients, only αi > 0 are needed in the description:this object will therefore be called the support vectors of thedescription (SV). So by definition, R2 is the distance fromthe center of the sphere to (any of the support vectors on)the boundary, i.e. objects with 0 < αi < C. Therefore

R2 =||xk − a||2

=(xk · xk)− 2∑i

αi(xk · xi) +∑i,j

αiαj(xixj)︸ ︷︷ ︸Ta(xk)

(12)

for any xk ∈ SV<C , the set of the support vectors whichhave αk < C.To test a new object z it is necessary to calculate its distanceTa(z) from the center of the sphere and compare it with R2

Page 4: Reliable AI through SVDD and rule extraction

A. CARLEVARO AND M.MONGELLI: RELIABLE AI THROUGH SVDD AND RULE EXTRACTION 3

(a) (b)

Figure 2. Negative SVDD applied to a two-spirals shaped data set [21]. It is interesting to note that for changing the target objects it is only necessaryto flip the labels. The asterisked points are the SV on the edge, depending on the respective class.

sgn(R2 − Ta(z)) ={+1 if z is inside the sphere−1 if z is outside the sphere

(13)

As it is common in machine learning theory [33], themethod can be made more flexible [29], [30] by replacing allthe inner products (xi · xj) with a kernel function K(xi,xj)satisfying Mercer’s theorem. The data are mapped into ahigher dimensional space via a feature map and there theprevious spherically classification is computed. The polyno-mial kernel and the gaussian kernel are discussed in [29],[30].

An example description by SVDD with different kernelfunctions for a 2 dimensional gaussian data set is shown inFig. 1. The 1000 data are generated by a gaussian distribu-tion with mean [0, 0] and variance 1. Figures are handmadedrawn using Matlab and the description bound is shown bya 2D contour plot.

2.1.2 Negative Examples Data DescriptionWhen two (or more) classes of data are available and it isnecessary to identify a specific one among the others, SVDDcan be trained to recognize objects that should be includedin the description from those that should be rejected. Thistask of SVDD can be very useful in real-world applicationswhere, for example, a safety region must be determined (seeSection 4).

In the following the target objects are enumerated byindices i, j and the negative examples by l,m . We assumethat target objects are labeled yi = 1 and outlier objects arelabeled yl = −1 .

In the same way as before, we want to solve this opti-mization problem:

minR,a,ξi,ξl

F (R,a, ξi, ξl) = R2 + C1

∑i

ξi + C2

∑l

ξl (14)

s.t

||xi − a||2 ≤ R2 + ξi,

||xl − a||2 ≥ R2 − ξl,ξi ≥ 0, ξl ≥ 0 ∀i, l

(15)

The constraints are again incorporated in equation (14)and the Lagrange multipliers αi, αl, γi, γl are introduced[30]:

L(R,a, ξi, ξl, αi, αl, γi, γl) = R2 + C1

∑i

ξi + C2

∑l

ξl

−∑i

γiξi −∑l

γlξl −∑i

αi[R2 + ξi − (xi − a)2]

−∑l

αl[(xl − a)2 −R2 + ξl]

(16)

with αi ≥ 0, αl ≥ 0, γi ≥ 0, γl ≥ 0.Setting the partial derivatives of Lwith respect toR,a, ξi

and ξl to zero gives new constraints [30]:

∑i

αi −∑l

αl = 1, a =∑i

αixi −∑l

αlxl (17)

0 ≤ αi ≤ C1, 0 ≤ αl ≤ C2 ∀i, l (18)

and substituting (17) in equation (16) we obtain similarly tobefore the dual problem of (14) and (15):

maxαi,αl

L =∑i

αi(xi · xi)−∑l

αl(xl · xl)−∑i,j

αiαj(xi · xj)

+2∑l,j

αlαj(xl · xj)−∑l,m

αlαm(xl · xm)

(19)

s.t

∑i αi −

∑l αl = 1

0 ≤ αi ≤ C1 ∀i0 ≤ αl ≤ C2 ∀l

(20)

Again, solving the previous optimization problem al-lows to determine αi and αl and then we can classify allthe data set objects according to the respective Lagrangecoefficient:

||xi − a||2 < R2 ⇒ αi = 0; ||xl − a||2 < R2 ⇒ αl = C2 (21)||xi − a||2 = R2 ⇒ 0 < αi < C1 (22)||xl − a||2 = R2 ⇒ 0 < αl < C2 (23)

||xi − a||2 > R2 ⇒ αi = C1; ||xl − a||2 > R2 ⇒ αl = 0 (24)

Page 5: Reliable AI through SVDD and rule extraction

4 JOURNAL OF INTELLIGENT SYSTEMS

Similarly, we test a new point z based on its distancefrom the center

||z− a||2 =(z · z)− 2

(∑i

αi(z · xi)−∑l

αl(z · xl))

+∑i,j

αiαj(xi · xj)− 2∑l,j

αlαj(xl · xj)

+∑l,m

αlαm(xl · xm) := Ta(z)

(25)

and we evaluate it compared to the radius squared

sgn(R2 − Ta(z)) ={+1 if z is inside the sphere−1 if z is outside the sphere

(26)

where the radius is calculated as the distance of any SV onthe edge (0 < αi < C1, 0 < αl < C2) from the center a

R2 = Ta(xk) for any xk ∈ SV<C1,<C2(27)

Similarly to before, it is possible to replace all the innerproducts (xi ·xj) with a kernel function K(xi,xj) [29], [30],[33] to obtain a more flexible description.

An example of Negative-SVDD is performed in Fig. (2):gaussian kernel with σ = 3 is used and the parameters C1and C2 are both set to 0.25.

2.2 Autonomous Detection of SVDD Parameters withRBF kernelLike most machine learning models, SVDD is massivelyinfluenced by the choice of model parameters. It is necessaryto find the best trade-off between error and covering bychoosing suitable C1 and C2 and the best kernel parameterσ that avoids overfitting or underfitting issues.

For this work we will focus on the RBF kernel since it iswell known that it is the kernel function that performs wellin application methods [29].

The method used to find the best model parametersis inspired by the work presented in [32] in which it isproposed an autonomous detection of the normal SVDDparameters based only in the training set, since in normalSVDD it is not possible to use cross-validation becauseonly true positives and false negatives can occur during thetraining. In our work instead we joined some techniques in[32] with cross-validation method for finding the best C1,C2 and σ parameters for negative SVDD.

The regularisation parameters C1, C2 are lower boundedby 1/N1 and 1/N2 respectively, where N1 is the numberof target objects and N2 the number of negative examples(N1 +N2 = N ) [29], [30], [32]. When in one class of trainingobjects set no errors are expected we can set Ci = 1 (i =1, 2), indicating that all objects of the target class of trainingset should be accepted (C1 = 1) and all outliers should berejected (C2 = 1). So the value range for C1 and C2 is

1

N1≤ C1 ≤ 1,

1

N2≤ C2 ≤ 1, (28)

The second parameter to be optimised is the kernelwidth σ. For high values of σ the shape of SVDD becomesspherical with the risk of underfitting, while for small values

0 10 20 30 40 50

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Figure 3. For too small or too high values of σ the optimization criterionλ (our metric for the ’best error’) is high. Also keep in mind the behaviorof the SV, which is very similar to the one described in [29], [30].

of σ too much objects become support vectors and the modelis prone to overfitting.

The search for the best parameters is performed byconstructing a grid with C1, C2 and σ, on which holdoutcross-validation is performed. The optimization criterionis chosen according to [32], selecting the parameters suchthat the respective misclassification error e and radius Rminimize

λ =√e2 + |1−R|2 (29)

for each triple C1, C2 and σ in the grid. The idea behind(29) is that minimizing the misclassification error meansreducing the number of support vectors [29], [30] (and soreducing overfitting) while constraining the radius to beclose to 1 means choosing small σ [32] (and so reducingunderfitting). Then the balance between these two termsseems the best criterion for finding the best parameters (seeFig. (3)).

2.3 Fast Training SVDD

The curse of dimensionality is a problem that affects manyoptimization and machine learning problems, and SVDD isnot saved. To overcome this problem, a method based oniterative training of only SV is proposed by [4].

The method iteratively samples from the training dataset with the objective of updating a set of support vectorscalled as the master set of support vectors (SV ∗). Duringeach iteration, the method updates SV ∗ and correspondingthreshold R2 value and center a. As the threshold valueR2 increases, the volume enclosed by the SV ∗ increases.The method stops iterating and provides a solution whenthe threshold value R2 and the center a converge. At con-vergence, the members of the master set of support vectorsSV ∗ characterize the description of the training data set.

2.4 Zero FNR Regions with SVDD

Safety regions research is a well-known task for machinelearning [11], [12], [13] and the main focus is to avoid falsenegatives, i.e., including in the safe region unsafe points.In this section, two methods for the research of zero FNRregions are proposed: the first one is based simply on the

Page 6: Reliable AI through SVDD and rule extraction

A. CARLEVARO AND M.MONGELLI: RELIABLE AI THROUGH SVDD AND RULE EXTRACTION 5

reduction of the SVDD radius until only safe points are en-closed in the SVDD shape, the second one instead performssuccessive iterations of the SVDD on the safe region untilthere are no more negative points.

2.4.1 Radius ReductionSince also in the transformed space via feature mapping theshape of SVDD is a sphere, it is reasonable to think thatreducing the volume of the sphere the number of negativepoints misclassifed should reduce. We implemented thissimple procedure in Matlab and we tested it on severaldatasets (see Fig. (4)):

Algorithm 1 RadiusReductionData set X × Y is divided in trainingset Xtr × Ytr and test set Xts × Yts

SVDD-cross-validation on Xtr × Ytr[a, R2]=SVDD(Xtr,Ytr, C1, C2, param)maxiter=1000;i=1;while(i<maxiter)

R2 = R2−10e-5*R2;Test SVDD on Xts × Yts

if(FNR< ε)return [a, R2];end

i = i+ 1;end

(a) FNR=0.517

(b) FNR=0.095

Figure 4. Application of Algorithm 1 on a data set of 400 points sampledfrom a gaussian with mean [1, 1] and variance 1, 200 target objects and200 negative examples. The algorithm converged in 12 iterations.

2.4.2 SVDD Zero FNR Iterative Procedure

Here we present another algorithm for finding zero FNR re-gions with SVDD. The idea is simply to perform successiveSVDDs on the safe regions found with a preliminary SVDDto avoid the presence of unsafe points. Again, we achieveconvergence when we reach a fixed number of iterations orwhen the condition on FNR is satisfied.

Algorithm 2 ZeroFNRSVDDData set X × Y is divided in trainingset Xtr × Ytr and test set Xts × Yts

SVDD-cross-validation on Xtr × Ytr[a, R2]=SVDD(Xtr,Ytr, C1, C2, param)Test SVDD on Xts × Ytsmaxiter=1000;i=1;while(i<maxiter)Xtri = "safety"(Xts);

SVDD-cross-validation on Xtri ×Ytri[ai,R2

i ]=SVDD(Xtri ,Ytri , C1, C2,param)

Test SVDD on Xts × Ytsif(FNR< ε)return [ai, R

2i ];

endi = i+ 1;

end

(a) FNR=0.925

(b) FNR=0.079

Figure 5. Application of Algorithm 2 on a data set of 2000 target objectssampled from a gaussian with mean [1, 1] and variance 4 and 100 neg-ative examples sampled from a gaussian with mean [1, 1] and variance5. (a) is the first iteration of the algorithm and (b) is the convergence atthe 97th iteration.

Page 7: Reliable AI through SVDD and rule extraction

6 JOURNAL OF INTELLIGENT SYSTEMS

We performed this algorithm in Matlab and tested usingdata from [19]. In Fig.(5) is reported an example with a 2dimensional gaussian data set.

3 RULES EXTRACTION

We now consider how to make the SVDD explainable inorder to explicit the inherent logic and use the extractedrules for further safety envelope tuning as in [12].

Let us suppose to have an information vector I and tohave to solve a classification problem depending on twoclasses ω = 0 or 1. Let ℵ = {(Ik, ωk), k = 1, . . . ,i} be a dataset corresponding to the collection of events representinga dynamical system evolution (ω) under different systemsettings (I(·)).The classification problem consists of finding the bestboundary function f(I(·), ·) separating the Ik points in ℵaccording to the two classes ω = 0 or ω = 1. For the caseof SVDD the best boundary f is simply the shape of thehypersphere. Although the shape of the hypersphere is wellintelligible (it is enough to have a center and a radius todescribe it), it is still interesting to have a rule-based shapeto describe it.

3.1 Logic Learning MachineThe derivation of f(I(·), ·) )in a rule-based shape is madeby DT and LLM (the analysis was performed through theRulex software suite, developed and distributed by RulexInc. (http://www.rulex.ai/)). They are both based on a setof intelligible rules of the type if (premise) then (consequence),where (premise) is a logical product (AND, ∧) of conditionsand (consequence) provides a class assignment for the output.In the present study, the two classes correspond to thepresence or the absence of anomalous patterns. LLM rulesare obtained through a three-step process. In the first phase(discretisation and latticisation) each variable is transformedinto a string of binary data in a proper Boolean lattice,using the inverse only-one code binarisation. All stringsare eventually concatenated in one unique large string pereach sample. In the second phase (shadow clustering) a setof binary values, called implicants, are generated, whichallow the identification of groups of points associated witha specific class. (An implicant is defined as a binary stringin a Boolean lattice that uniquely determines a group ofpoints associated with a given class. It is straightforwardto derive from an implicant an intelligible rule having in itspremise a logical product of threshold conditions based oncut-offs obtained during the discretisation step. The optimalplacement of these cut-offs is, therefore, an important phaseto extract the highest information gain before clustering [2].)During the third phase (rule generation) all implicants aretransformed into a collection of simple conditions and even-tually combined in a set of intelligible rules. The interestedreader on shadow clustering and algorithms for efficientrule generation is referred to [15] and references therein.

3.2 Rules extraction from SVDDAs far as SVDD is concerned, the derivation of intelligiblerules is made in this way: after that a SVDD is computed andtested, a new data set of observations is provided and the

classification via SVDD is made. The new dynamical systemobtained is then exported in Rulex and a LLM algorithmwith zero error or a DT algorithm is executed over thedata, obtaining then the set of intelligible rules. Algorithm 3summarizes the procedure:

Algorithm 3 IRulesSVDDApply Algorithm 1 or Algorithm 2 onX × Y data set

generate randomly a new data set Xnewas a copy of XClassify Xnew in Ynew with [a, R2]from Algorithm1/Algorithm2apply LLM/DT algorithmfind an explained safety region Rreturn R

For example, for the case of vehicle platooning (seeSection 4) the first three rules for covering (i.e. how manypoints are covered by rule r) of SVDD (Algorithm 2) usingLLM are

if ((N < 7) ∧ (F0 > −8 ∧ F0 <= −3)) then safeif (d(0) <= 8.99) ∧ (v(0) > 12 ∧ v(0) <= 23)) then safeif ((N < 6) ∧ (PER > 0.08 ∧ PER <= 0.46)) then safe

As in [12] we applied these rules with the goal ofmaximizing the number of safe points (that is the number ofpoints in the target class) while keeping FNR at zero. Thisis possible by performing rule tuning as in [12] but SVDDallows for much more flexibility.

Figure 6. Rule Viewer

Figure 6 shows, as an example, a summary of the rulesextracted with LLM from SVDD, Algorithm 2, in the case ofvehicle platooning (see Section 4.2). Each circle represents arule and the larger this is the more the respective rule coversa larger number of points. In this example the classificationis done in two classes, green and red, and in the outer crownthe input features are shown. The high number of rules isan indication of the complexity of the system: with a two-dimensional example we could say that a large number ofrectangles (rules) is needed to best approximate the compli-cated shape of the SVDD. We will discuss these concepts inmore detail in Section 4, dedicated to applications.

Page 8: Reliable AI through SVDD and rule extraction

A. CARLEVARO AND M.MONGELLI: RELIABLE AI THROUGH SVDD AND RULE EXTRACTION 7

4 APPLICATIONS

Finally in this section we investigate how the SVDD worksin real classification problems. First we focus on a simpleexample concerning the stability certification of dynamicalsystems through ROA [14], where we want to focus onthe performance of rule extraction, and then we move ona much more complex and safety relevant automotive ex-ample of cyber-physical system [22]: the vehicle platooning[23].

4.1 ROA inferenceThe concept of Region of Attraction (ROA) is fundamental inthe stability analysis of dynamical systems [20], [35] and itis topical when safety of cyber physical system should bepreserved with zero (probabilistic) error [12], [13].

ROA is typically derived through the level sets of Lya-punov functions but in this case we want to estimate ROAthrough negative SVDD: we define the target class as the setof stable points and the negative class as the unstable ones.We consider the Van der Pol oscillator in reverse time:{

x1 = −x2x2 = x1 + (x21 − 1)x2

(30)

the stability region is depicted in blue in Figure (7). Thesystem has one equilibrium point at the origin and anunstable limit cycle on the border of the true ROA.

Figure 7. ROA of the Van der Pol oscillator. In yellow the SVDD shapeobtained through fast-SVDD as in Section 2.3.

The simulation of the dynamical system is developed inC [18] and the dataset is composed by 300000 points (x1, x2)with the relative labels (+1 stable, -1 unstable). Due to thebig size of the dataset a Fast SVDD as in Section 2.3 isrequired. We implemented the negative SVDD and testedit over this dataset: we obtained good results (in term ofzero FNR) without using Algorithm 1 or Algorithm 2 due tothe good separation between the two classes. In Figure (7) itis shown the SVDD shape (in yellow), and the performanceindices are:

ACC = 0.9854 FNR = 0 FPR = 0.0542 (31)

where ACC = TP+TNTP+TN+FP+FN is the accuracy of the

model, FNR = FNFP+TN is the False Negative Rate and

FPR = FPFP+TN is the False Positive Rate.

Then a set of intelligible rules is extracted as describedin Section 3 (LLM and DT) and they are tested on several

extraction of different size datasets (see Figure 8), which areall copies of a same dataset [18], with the aim to profile thelargest region in term of "safe points", that is the precisionon the target class TP

TP+FP .

Figure 8. Comparison of the percentage of safe points with LLM/DTbefore and after SVDD, VdP example.

We made 103 successive extractions from the dataset (withdifferent sizes, from 8% up to 50% of the total points): foreach of them the FNR is almost zero and the precision onthe target class is high, i.e. there is a good percentage ofsafe points. We can see that the performance of the rulesextracted with DT after applying SVDD is quite inferior tothe others. This is due to the fact that DT generates fewerrules than LLM and the constraint imposed by the shape ofSVDD does not allow to generate rules with high coverage(i.e., small rectangles).

4.2 Vehicle PlatooningVehicle Platooning (VP) is taken as a reference here as beingrepresentative of one of the most challenging CPS of theautomotive sector [22]. The main goal in VP is finding thebest trade-off between performance (i.e. maximising speedand minimising vehicles reciprocal distance) and safety (i.e.avoiding collision) [9]. Most of the literature on this topicfocuses on advanced control schemes while abstracting thecommunication medium. Delay of communication is typi-cally considered as fixed or described through probabilisticmodels. This allows the analytical derivation of stabilitymodels under some hypotheses of the dynamical system[17], but it may be unreliable under realistic conditions. Twobranches are evident from the literature in this respect: thederivation of simple models of the delay bound that guaran-tees safety (see, e.g. Section IV.C of [34]) and extensive sim-ulation with visualisation of safety regions under subsetsof parameters when addressing realistic communication [7],[27], and realistic vehicles [25].

The following scenario is considered. Given the platoonat a steady state of speed and reciprocal distance of thevehicles, a braking is applied by the leader of the platoon[25], [34]. The behaviour of the dynamical system is investi-gated with respect to the following metrics. Safety is referredto a collision between adjacent vehicles (in the study, it isactually registered when the reciprocal distance betweenvehicles achieves a lower bound (e.g. 2 m)). For both safetyand driving comfort, string stability (SS) is also important.

Page 9: Reliable AI through SVDD and rule extraction

8 JOURNAL OF INTELLIGENT SYSTEMS

3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8-8

-6

-4

-2

3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 80

0.2

0.4

0.6

3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 84

6

8

10

3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 80

50

100

-8 -7 -6 -5 -4 -3 -2 -1 00

0.2

0.4

0.6

-8 -7 -6 -5 -4 -3 -2 -1 04

6

8

10

-8 -7 -6 -5 -4 -3 -2 -1 00

50

100

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.54

6

8

10

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.50

50

100

4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 90

50

100

Figure 9. Scatter plots of the quantities of the platooning dynamical system as in [11], [12], [13]. In blue non-collision points are plotted, in redcollision ones.

Table 1Results on VP data set.

FNR % safe # iter # time (s) R2 #SV

Alg 1 0.0993 71.34 15 192.12 0.9220 139Alg 2 0.0556 78.06 42 310.31 0.8158 61

It means that speed and acceleration fluctuations should beattenuated downstream the string of vehicles.

The dynamic of the system is generated by the followingdifferential equations [34]:{

v = 1mi

(Fi − (ai + bi · v2i ));di = vi−1 − vi

(32)

where vi is the speed of vehicle i, mi the mass of the vehiclei, di the distance of vehicle i from the previous one i − 1,ai is the tyre/road rolling distance, bi the aerodynamic dragand the control law Fi.

The behaviour of the dynamical system is synthesisedby the following vector of features:

I = [N, ι(0), F0,m,q,p] (33)

N+1 being the number of vehicles in the platoon (subscripti = 0 defines the index of the leader), ι = [d,v,a] arethe vectors of reciprocal distance, speed, and accelerationof the vehicles, respectively (ι(0) denotes that the quantitiessampled at time t = 0, after which a braking force isapplied by the leader [25]. Simulations are set in order tomanage possible transient periods and achieve a steadystate of ι before applying the braking.), m are the vectorsof weights of the vehicles, F0 is the braking force appliedby the leader, q is the vector of quality measures of thecommunication medium, fixed delay and packet error rate(PER) are considered in the study, p is the vector of tuningparameters of the control scheme.

The Plexe simulator [25], [34] is used to registeri = 15 × 103 observations and then we reduced themunder the following ranges:

N ∈ [3, 8], F0 ∈ [−8,−1]×103N (from now on, the notation(×103) is omitted when referring to thresholds applied toF0), PER ∈ [0, 0.5], d(0) ∈ [4, 9] m, v(0) ∈ [10, 90] Km/h.With these choice the size of the sample has been reducedto 7567 samples (see Fig.(9)).

Our goal is to determine the largest region of parameterswith no false negatives (i.e. prediction of no collision,but a collision in reality). To do this, we applied thetwo algorithms proposed in Section 2.4 to the 7567 sizesample above (a Fast-SVDD is used, see Section 2.3) usingRBF kernel with C1 = 1, C2 = 1 and σ determinedwith cross-validation. The results are shown in Table 1,where FNR is the usual False Negative Rate, % safe is thepercentage of safe points (computed as the precision onthe positive class TP

TP+FP ), #iter the number of algorithmiterations, #time (s) the time in second for the convergence,R2 the squared hypersphere’s radius, #SV the number ofdetermined support vectors.

Then we tested the performances of the algorithms indifferent extractions of 103 subsets with different sizes from8% to 50% of the total points available for test; 11×103 trialsin total. We compared them with LLM and DT as in [12] (seeFigure 10) and so a rules extraction has been requested (seeSection 3). LLM and DT are tuned according to [12] (Section4.4). The procedure can be briefly summarize in this way:(1) manually inspect of the most relevant regions for safety.(2) LLM/DT is trained with zero error when developing therules. (3) Progressively extraction of unsafe points from theoriginal data set until only safe points are obtained.

The analysis shows that SVDD performs the best safetyregion in the chosen ranges of parameters: up to 70% of safepoints with almost zero FNR for Algorithm 1 and up to 80%for Algorithm 2. The comparison with the other methods

Page 10: Reliable AI through SVDD and rule extraction

A. CARLEVARO AND M.MONGELLI: RELIABLE AI THROUGH SVDD AND RULE EXTRACTION 9

Figure 10. Comparison of the percentage of safe points with LLM/DTbefore and after SVDD, Platooning example.

shows as the rules extracted from SVDD are the better ones,but due to the complex form of SVDD boundary function anhigher number of them is required: 674 rules for Algorithm1 and 771 for Algorithm 2 for LLM, 229 rules for Algorithm1 and 314 rules for Algorithm 2 for DT against 5 rules forLLM and 3 rules for DT without using SVDD. The rules areapplied all together in logical OR (∨).

5 CONCLUSION AND FUTURE WORK

The study shows how SVDD can be a very useful methodfor identifying safety regions, even in complex applicationssuch as VP.This paper also provides a detailed methodology on how todeal with application problems in machine learning, such asparameter tuning and handling large data sets. In addition,a more thorough explanation on negative SVDD has beenperformed. Thus, the proposed approach could be appliedfor a wide range of applications.In the future, it will be interesting to study a method fordirect rule extraction from SVDD, like the one developedfor SVM in [16].

REFERENCES

[1] Abe, S.: Support Vector Machines for Pattern Classification (Ad-vancesin Pattern Recognition), 2nd ed. Springer-Verlag LondonLtd., 2010.

[2] Boros, E., Hammer, P.L., Ibaraki, T., et al.: ‘An implementation oflogical analysis of data’, IEEE Trans. Knowl. Data Eng., 2000, 12,(2), pp. 292–306

[3] Balasubramanian, V. N., Ho, S.S.,Vovk, V.: Conformal Predictionfor Reliable Machine Learning.Morgan Kaufmann Elsevier, 2014.225 Wyman Street, Waltham, MA 02451, USA. Edition 1,isbn9780123985378.

[4] Chaudhuri, A., Kakde, D., Jahja., M, Xiao, W., Kong, S., Jiang, H.,Peredriy, S.: Sampling Method for Fast Training of Support VectorData Description. arXiv e-prints, 2016arXiv160605382C 2006

[5] European Union Aviation Safety Angency: Concepts of DesignAssurance for Neural Networks CoDANN. 2020 mar, EASA AITask Force. Daedalean, AG.

[6] Fisch, D., Hofmann, A., Sick, B.: ‘On the versatility of radialbasis function neural networks: a case study in the field ofintrusion detection’, Inf. Sci., 2010, 180, (12), pp. 2421–2439.Available at http://www.sciencedirect.com/science/article/pii/S0020025510001015

[7] Ge, J.I., Orosz, G.: ‘Dynamics of connected vehicle systems withdelayed acceleration feedback’, Transp. Res. C, Emerg. Technol.,2014, 46, pp. 46–64. cited By 90

[8] Huang, G., Chen, H., Zhou, Z., Yin, F., Guo, K.: Two-class supportvector data description. Pattern Recognition 44 (2011) 320-329.

[9] Jia, D., Lu, K., Wang, J., et al.: ‘A survey on platoon-based vehicularcyber-physical systems’, IEEE Commun. Surv. Tutor., 2016, 18, (1),pp. 263–284

[10] Jones, C.A.,: Lecture notes: Math2640 introduction to optimisation4. University of Leeds, School of Mathematics, Tech. Rep., 2005.

[11] Mongelli, M., Muselli, M., Scorzoni, A., Ferrari, E.: AccelleratingPRISM Validation of Vehicle Platooning Through Machine Learn-ing. (2019) 452-456. 10.1109/ICSRS48664.2019.8987672.

[12] Mongelli, M., Muselli, M., Ferrari, E., Fermi, A.: Performance val-idation of vehicle platooning via intelligible analytics. (2018) IETCyber-Physical Systems: Theory & Applications. 4. 10.1049/iet-cps.2018.5055.

[13] Fermi, A., Mongelli, M., Muselli, M., Ferrari, E.: "Identifica-tion of safety regions in vehicle platooning via machine learn-ing," 2018 14th IEEE International Workshop on Factory Com-munication Systems (WFCS), Imperia, Italy, 2018, pp. 1-4, doi:10.1109/WFCS.2018.8402372.

[14] Mongelli, M., Orani, V.: "Stability Certification of Dynamincal Sys-tems: Lyapunov Logic Learning Machine". IEEE Control DecisionConference 2020.

[15] Muselli, M., Ferrari, E.: ‘Coupling logical analysis of data andshadow clustering for partially defined positive Boolean functionreconstruction’, IEEE Trans. Knowl. Data Eng., 2011, 23, (1), pp.37–50

[16] Nunez, H., Angulo, C., Català, A.: Rule-Based Learning Systemsfor Support Vector Machines. Neural Processing Letters (2006)24:1-18

[17] Oncu, S., van de Wouw, N., Nijmeijer, H.: ‘Cooperative adaptivecruise control: tradeoffs between control and network specifica-tions’. 2011 14th Int. IEEE Conf. on Intelligent TransportationSystems (ITSC), Washington, DC, USA, 2011, pp. 2051–2056

[18] Mongelli, M., Orani, V.. Git repository of lya-punov logic learning machine. [Online]. Available:https://github.com/mopamopa/Liapunov-Logic-Learning-Machine

[19] KEEL, “Website: KEEL (Knowledge Extraction based onEvolutionary Learning),” Nov. 2012. [Online]. Available:http://sci2s.ugr.es/keel/datasets.php

[20] Khalil, H., Nonlinear systems, 3rd ed. Prentice Hall, 2002.[21] Kools, J.: 6 functions for generating artificial datasets

(https://www.mathworks.com/matlabcentral/fileexchange/41459-6-functions-for-generating-artificial-datasets), MATLABCentral File Exchange. Retrieved April 4, 2021.

[22] Pop, P., Scholle, D., Hansson, H., et al.: ‘The safecopecsel project:safe cooperating cyber-physical systems using wireless commu-nication’. 2016 Euromicro Conf. on Digital System Design (DSD),Limassol, Cyprus, 2016, pp. 532–538

[23] Pop, P., Scholle, D., Sljivo, I., et al.: ‘Safe cooperating cyber-physicalsystems using wireless communication’, Microprocess. Microsyst.,2017, 53, pp. 42– 50

[24] Czarnecki, K., Salay, R.: Towards a Framework to Manage Per-ceptual Uncertainty for Safe Automated Driving, InternationalWorkshop on Artificial Intelligence Safety Engineering (WAISE),2018. Springer, Västerås, Sweden.

[25] Santini, S., Salvi, A., Valente, A.S., et al.: ‘A consensus-basedapproach for platooning with intervehicular communications andits validation in realistic scenarios’, IEEE Trans. Veh. Technol.,2017, 66, (3), pp. 1985–1999

[26] Standardization in the area of Artificial Intelligence, ISO/IEC.Creation date 2017, Washington, DC 20036, USA. Also available"https://www.iso.org/committee/6794475.html"

[27] Segata, M., Cigno, R.L.: ‘Automatic emergency braking: realisticanalysis of car dynamics and network performance’, IEEE Trans.Veh. Technol., 2013, 62, (9), pp. 4150–4161

[28] Road vehicles Safety of the intended functionality PD ISOPAS 21448:2019. International Organization for Standardization,Geneva, CH.

[29] Tax, D.M.J., Duin, R.P.W.: Support vector domain description.Pattern Recognition Letters 20 (1999) 1191-1199

[30] Tax, D.M.J., Duin, R.P.W.: Support Vector Data Description. Ma-chine Learning, 54, 45-66, 2004

Page 11: Reliable AI through SVDD and rule extraction

10 JOURNAL OF INTELLIGENT SYSTEMS

[31] Tax, D.M.: One-class classification, concept-learning in the absenceof counter-examples. Ph.D. dissertation, Delft University of Tech-nology, 2001.

[32] Theissler, A., Dear, I.: Autonomously determining the parametersfor SVDD with RBF kernel from a one-class training set. Confer-ence: WASET International Conference on Machine IntelligenceAt:Stockholm 2013

[33] Vapnik, V.: The Nature of Statistical Learning Theory, Springer,New York, 1995.

[34] Xu, L., Wang, L.Y., Yin, G., et al.: ‘Communication informationstructures and contents for enhanced safety of highway vehicleplatoons’, IEEE Trans. Veh. Technol., 2014, 63, (9), pp. 4206–4220

[35] Zhai, C., Nguyen, H., D., Region of attraction for power systems usinggaussian process and converse lyapunov function - part i: Theoreticalframework and off-line study, 2019.

Alberto Carlevaro He received the Master De-gree in Applied Mathematics in May 2020 fromthe University of Genoa with 110 out of 110 cumlaude with a physics-mathematics thesis on thebehavior of liquid crystals under electromagneticfields. He was a research fellow at the Institute ofElectronic, Computer and TelecommunicationsEngineering (IEIIT) of the National ResearchCouncil (CNR) where he worked on MachineLearning and Explainable AI in collaboration withRulex Inc. He is now a PhD student in the De-

partment of Electrical, Electronic and Telecommunications Engineeringand Naval Architecture (DITEN) in the research topic "Traffic Analysis inthe Smart City", in collaboration with CNR and S.M.E. Aitek. His currentfields of research are Machine Learning, Federated Learning, ExplainbleAI.

Maurizio Mongelli He obtained his Ph.D. De-gree in Electronics and Computer Engineeringfrom the University of Genoa (UNIGE) in 2004.The doctorate was funded by Selex Communi-cations S.p.A. (Selex). He worked for both Selexand the Italian Telecommunications Consortium(CNIT) from 2001 until 2010. During his doc-torate and in the following years, he worked onthe quality of service for military networks withSelex. From 2007 to 2008, he coordinated a jointLaboratory between UniGe and Selex, dedicated

to the study and prototype implementation of Ethernet resilience mech-anisms. He was the CNIT technical coordinator of a research projectconcerning satellite emulation systems, funded by the European SpaceAgency; spent three months working on the project at the GermanAerospace Center in Munich. Since 2012 he is a researcher at theInstitute of Electronics, Computer and Telecommunication Engineering(IEIIT) of the National Research Council (CNR), where he deals withmachine learning and cyber security, having the responsibility and coor-dination, for the CNR part, of funded projects (5, of which 1 at Europeanlevel) in these sectors. He is co-author of over 100 international scientificpapers and 2 patents