Consumers’ Preferences Modeling With Multiclass Fuzzy Support Vector Machines

8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

1/23

1

Consumers Preferences Modeling With Multiclass Fuzzy Support Vector

Machines

Chih-Chieh Yang

Department of Multimedia and Entertainment Science, Southern Taiwan University

No. 1, Nantai Street, Yongkang City, Tainan County, Taiwan 71005

Meng-Dar Shieh

Department of Industrial Design, National Cheng Kung University, Tainan, Taiwan

70101

Abstract

Consumers preferences toward product design are often affected by a large

number of form features. It is very important for product designers to understand the

relationship between consumers preferences and product form features. In this paper,

an approach based on multiclass fuzzy support vector machines (multiclass fuzzy

SVM) is proposed to construct the prediction model of consumers preferences.

Product samples are collected and their form features are systematically examined.

Each product sample is assigned a class label and a fuzzy membership agreeing this

label to formulate a multiclass classification problem. A one-versus-one multiclassfuzzy SVM model is constructed using collected product samples. Optimal training

parameter set of model is determined by a two-step cross-validation. A case study of

mobile phone design is also given to demonstrate the effectiveness of proposed

methodology. Two standard kernel functions including polynomial kernel and

Gaussian kernel are used and compared their performance. The experiment results

show that the performance of Gaussian kernel model is better than polynomial model.

The Gaussian model performed very well and is capable to prevent the overfitting

problem.

Keywords: Consumers preferences; Multiclass fuzzy support vector machines;

Mobile phone design

1. Introduction

The appearance of product is one of the most important factors affecting

consumers purchase decision. Traditionally, the quality of product form design

heavily depends on designers intuitions and not proves to gain success in the


2/23

2

marketplace. In order to understand the consumers preferences and develop appealing

product in a more effective manner, many researches have been done to study product

form design in systematic approaches. The most noticeable research is Kansei

Engineering proposed by (Jindo, Hirasago et al. 1995). The main issue is how to deal

with the inter-attribute correlations between attributes and take care of the nonlinear

property of attributes (Shimizu and Jindo 1995; Park and Han 2004). The mostly

adapted techniques in product design field such as multiple regression analysis (Park

and Han 2004) or multivariate analysis (Shimizu and Jindo 1995) heavily dependent

on the assumption of impendence and linearity hence can not deal with the

nonlinearity of the relationship effectively. In addition, prior to establish mathematical

model, data simplification and variable screening is often needed to obtain better

results (Hsu, Chuang et al. 2000). Fuzzy regression analysis (Shimizu and Jindo 1995)

and other method suffer the same shortcomings (Park and Han 2004).

(Vapnik 1995) developed a new kind of algorithm called support vector machine

(SVM). SVM has been shown to provide higher performance than traditional learning

techniques (Burges 1998). SVMs remarkable and robust performance with respect to

sparse and noisy data makes them first choice in a number of applications such as

pattern recognition (Burges 1998) and bioinformatics (Scholkopf, Guyon et al. 2001).

SVM is known as its elegance in solving the nonlinear problem with the technique of

kernels that automatically do a nonlinear mapping to a feature space. In a

consequence, the nonlinear relationship between product form features can beprocessed effectively by introducing suitable kernel function.

This study proposed an approach based on multiclass fuzzy SVM for consumers

preferences modeling. This approach begins with processing product form with

discrete and continuous attributes and can also deal with sparse feature vectors. Each

product sample is assigned a class label and a fuzzy membership used to describe the

semantic differential score that agrees this label. A one-versus-one multiclass fuzzy

SVM model is constructed using collected product samples. Optimal training

parameter set of model is determined by a two-step cross-validation. The reminder of

the paper is organized as follows. Section 2 gives an introduction of multiclass fuzzy

SVM. Section 3 presents the proposed prediction model of consumers preferences.

Section 4 demonstrates the experimental results of the proposed model using mobile

phone design as example. Finally, Section 5 presents some brief conclusions and

suggestions for future work.

2. Multiclass fuzzy support vector machines

2.1. Fuzzy support vector machines for binary classification


3/23

3

An SVM maps the input points into high-dimensional feature space and finds a

separating hyperplane that maximizes the margin between two classes in the space.

Maximizing the margin is a quadratic programming (QP) problem can be solved from

its dual problem by introducing Lagrangian multipliers. Without any knowledge of the

mapping, the SVM finds the optimal hyperplane by using the dot product functions in

feature space via the aid of kernels. The solution of the optimal hyperplane can be

written as a combination of a few input points that are called support vectors.

In many real-world applications, input samples may not be exactly assigned to

one class and the effects of the training samples might be different. Some are more

important to be fully assigned to one class so that SVM can separate these samples

more correctly. Some samples might be noisy and less meaningful and should discard

them. Equally treating every data samples may cause unsuitable overfitting problem.

The original SVM lacks this kind of ability. (Huang and Liu 2002; Lin and Wang

2002) proposed the concept of fuzzy SVM which combines fuzzy logic and SVM to

make different training samples have different contributions to their own class. The

central of their concept is to fuzzify the training set and assign each data sample a

membership value according to its relative importance in the class. A description of

fuzzy SVM is given in the Appendix.

Figure 1 illustrated a simplified binary classification problem with only two

attributes training by fuzzy SVM using linear kernel. Since all data samples only havetwo attributes, the data point can be plotted in 2D plane and explain the training

results in a more intuitive manner. Red and blue disks are two classes of training

samples. Grey values indicate the value of the argument1

( , )l

i i i

i

y K x x b

of Eq. (20)

in the Appendix. Then new data sample without given class label can be discriminated

according to Eq. (3) in the Appendix. In Figure 1 the middle solid line is the decision

surface and data points lie on this surface satisfy ( ) 0D x . The outer dash lines

precisely meet the constraint in Eq. (6) in the Appendix and data points lie on this

margin satisfy ( ) 1D x or ( ) 1D x .

In addition, support vectors are very useful in data analysis and interpretation. In

the original definition of SVM, the data points satisfying the condition 0i are

called a support vector. In fuzzy SVM, the same value of i may indicate a different

type of support vectors due to the factor i (Lin and Wang 2002). The one with

corresponding i iC is misclassified. The one with corresponding 0 i iC

lies on the margin of the hyperplane (marked by extra circles in Figure 1).

Different fuzzy memberships to different classes were applied to demonstrate the

training effect of i . InFigure 1(a) both classes are assigned membership equal to 1,


4/23


5/23

5

C = 10000 C = 1000

C = 100 C = 10

C = 1 C = 0.1

Figure 2. The relationship between parameter C and the training margin

(polynomial kernel with degree 2).

2.2 One-versus-one multiclass support vector machines

In previous section we have described the concept of fuzzy SVM. However,

fuzzy SVM is still limited in binary classification. How to effectively extend SVM for

multiclass classification is still an on-going research issue. (Hsu and Lin 2001; Duan

and Keerthi 2005) compared the performance of several multi-class SVM methods

based on binary SVM, including one-versus-rest (OVR) (Bottou, Cortes et al. 1994),

one-versus-one (OVO) (Krebel 1999), and directed acyclic graph SVM (DAGSVM)


6/23


7/23

7

Two characteristics of product form features are considered in this study. Firstly,

the form feature vector is often sparse. This is mainly because there often exist large

amounts of features to represent product form design, and each product sample is not

necessarily occupied all form features. The number of active or non-zero features in a

feature vector is lower than the total number of features. This situation is very

common in product form feature representation (Kwahk and Han 2002). Secondly,

product form features are often mixed with two kinds of attributes denoted as

discrete or continuous type. Discrete attributes denote categorical choices among

fixed number of variables, such as types of texture, material used in parts etc.

Continuous attributes such as length, and proportion often have some kind of scale or

can be measured and the domain of variable is continuous without interruption. SVM

can deal with mixed attribute of discrete and continuous types at the same time. Since

SVM requires that each data sample be represented as a vector of real numbers,

discrete attributes can be represented as integer number. Taking a three-category

attribute circle, rectangle, triangle for example, it can be coded as {1,2,3} . As for

continuous attributes, because kernel values usually depend on the inner products of

feature vectors, e.g. linear kernel and polynomial kernel, large attribute values might

cause numerical problems (Hsu, Chang et al. 2003). Continuous attributes are linearly

scaled to the range [0, 1] to avoid numerical difficulties during calculation.

3.2. Describing consumers preferences using class labels

The concept of product positioning was borrowed describe the consumers

preferences toward product form design. (Kaul and Rao 1995) suggested that a

company should provide an array of products into the marketplace in order to the

needs of each homogenous consumer segment. Vice versa, consumers often make

choices in the marketplace according to the perceived product attributes. Base on this

idea, product samples are assumed to be distinguished by consumers and classified

into different groups. The managerial decisions can be made more effectively by

identifying the relative importance attached to various product attributes. Take the

mobile phone design for example, class labels such as sports, simplicity, female, plain

and business etc. are used to describe different product divisions provided in the

marketplace. Although there exist other product characteristics may affect consumers

subjective perceptions (brand, price, etc.), the authors mainly emphasize on the

factors only in product form design. Other marketing strategies which may influence

the decision of consumers are beyond the scope of this study.

3.3. Collecting product samples


8/23

8

A total of 69 mobile phones were collected from the Taiwan market in 2006.

Three product designers each with at least 5 years experiences conducted the product

form features analysis. They first examined the main component structure using the

method proposed in (Kwahk and Han 2002) and then used this structure to analyzes

all product samples. Form features of each product sample were discussed by all

designers and determine one unified representation. Continuous attributes were

recorded directly while discrete attributes were processed by the method described in

Section 3.1. The color and texture information of the product samples were ignored

and emphasized on the form features only. All entities in the feature matrix are

prepared for training multiclass fuzzy SVM. Five class labels such as sports,

simplicity, female, plain and business are chosen for semantic evaluations. In order to

collect consumers perception data for mobile phone design, 30 subjects, including 15

males and 15 females, were asked to evaluate all product samples using the selected

five class labels. Each subject was asked to choose the most suitable class label for

resenting each product sample, and evaluates each sample in a semantic differential

scale from 0 (very low) to 1 (very high). Since each product sample only had single

instance when training the multiclass fuzzy SVM model, the label with most

frequently assigned label were used for representing each product sample. Training

multiple instances of samples is another interesting issue worth of further research.

The selected class label is assigned as +1, and rest of the labels is assigned as 1. Thesemantic differential score is directly stored as the membership value for fuzzy SVM

training.

3.4. Constructing multiclass fuzzy SVM model

In this study, each product sample is assigned a class label to formulate a

multiclass classification problem. This problem is then divided into a series of OVO

SVM sub-problems. The objective of multiclass classification is to correctly

discriminating these classes from each other and each OVO problem is addressed by a

two different class labels (e.g. sports versus simplicity). Each classifier uses the fuzzy

SVM to define a hyperplane that best separates product samples into two classes.

Each test sample is sequentially presented to each of the 5 (5 1) / 2 10 OVO

classifiers and can be predicted which label it belong to, based on the OVO classifier

having the largest vote.

3.5. Choosing optimal parameters using cross-validation


9/23

9

Since the number of product samples is limited, it is important to obtain best

generalization performance and reduce the overfitting problem. Practical

implementation is to partition these data samples into training data and testing data.

Various partition strategies have been proposed including leave-one-out

cross-validation (LOOCV), k-fold cross-validation, repeated random subsampling,

and bootstrapping (Berrar, Bradbury et al. 2006). In this study, 5-fold cross-validation

is used to choose optimal parameters. The whole training samples are randomly

divided into five subsets of approximately equal size. Each multiclass model is trained

using 5 1 4 subsets and tested using the remaining subset. Training is repeated

five times and the average testing error rates for all the five subset that are not

included in the training data is calculated.

The performance of SVM model is heavily dependent on the regulation

parameter C and the parameter of chosen kernel function. Take the Gaussian kernel

for example, each binary classifier requires the selection of two parameters, which are

the regularization parameter C and kernel parameter 2 . C and 2 of each

classifier within the multiclass model are set to be the same for calculation efficiency.

Since cross validation may be very time-consuming, a two-step grid search is

conducted to find the optimal hyperparameter pair (Hsu, Chang et al. 2003). In the

first step, a coarse grid search is taken using the following sets of values:

3 310 , ...,10C and 2 3 310 , ...,10 . Thus 49 combinations of C and 2 are

tried in this step. An optimal pair 20 0( , )C is selected from the coarse grid search. In

the second step, a fine grid search is conducted around 20 0( , )C , where

0 0 0 0 0 0 00.2 ,0.4 ,...,0.8 , ,2 ,4 ,...,8C C C C C C C C , and

2 2 2 2 2 2 2 20 0 0 0 0 0 00.2 ,0.4 ,...,0.8 , ,2 ,4 ,...,8 .

All together, 81 combinations of C and 2 are tried in this step. The optimal

hyperparameter pair is selected from this fine search. Likewise, the same two-step

grid search is repeated on polynomial kernel. For polynomial kernel, the coarse grid

of polynomial is taken as 3 310 , ...,10C and 1,2,...,5 . When 0 0( , )C is

determined, the range of the fine grid search is as

0 0 0 0 0 0 00.2 ,0.4 ,...,0.8 , ,2 ,4 ,...,8C C C C C C C C , and

0 0 0 0 0 0 00.2 ,0.4 ,...,0.8 , ,1.2 ,1.4 ,...,1.8 .

After comparing the performance of all training models using different kernel


10/23

10

functions and parameters, the best combination of parameters obtained by

cross-validation is used to build the multiclass fuzzy SVM model.

4. Experimental results

4.1. Data set

Mobile phone design has been selected to demonstrate the proposed

methodology. Table 1 shows a part of product samples used in this study. The set of

( 1,2, ...,10)iS i represents a part of product samples to be analyzed; the set of

( 1, 2,..., 6)iX i denotes the product form feature attributes; and the set of

( 1, 2,..., 5)iY i represents the class labels; i is the membership value of the +1 class

label of each product sample iS . For the sake of simplicity only six product form

features are listed in the example of Table 1.

1 2 3 4 5 6{ , , , , , }X X X X X X X

{body-length,body-width,body-thickness,body-volume,

body-type,function-button-type}

Five class labels are used to describe consumers subjective perceptions of mobile

phone design. These class labels list as following:

1 2 3 4 5{ , , , , }Y Y Y Y Y Y

{sports, simplicity, female, plain, business}

Take product sample 1S as example, the consumer choose label 4Y plain and

the attitude of 4Y is 0.5 . A complete list of all product form features is shown in

Table 2.

Table 1. Part of training product samples for mobile phone design.

Product samples

( S )

Product form features

( X )

Class labels and membership values

( ,Y )

1X 2X 3X 4X 5X 6X 1Y 2Y 3Y 4Y 5Y

1S 0.75 0.45 0.72 0.62 2 3 -1 -1 -1 +1 -1 0.5

2S 0.67 0.43 0.64 0.47 3 3 -1 -1 +1 -1 -1 0.8

3S 0.79 0.42 0.57 0.48 1 3 +1 -1 -1 -1 -1 0.5

4S 0.75 0.44 0.6 0.5 3 3 -1 -1 -1 +1 -1 0.6

5S 0.67 0.42 0.77 0.54 2 2 -1 -1 +1 -1 -1 0.6

6S 0.72 0.48 0.53 0.47 2 3 -1 -1 -1 +1 -1 0.9

7S 1 0.44 0.56 0.63 1 1 -1 -1 +1 -1 -1 0.7

8S 0.77 0.45 0.81 0.71 2 1 -1 -1 +1 -1 -1 1

9S 0.75 0.45 0.72 0.62 2 3 -1 -1 +1 -1 -1 0.6

10S 0.67 0.43 0.64 0.47 3 3 -1 -1 +1 -1 -1 0.8


11/23

11

Table 2. Complete list of product form features used in this study.

Form features Type Attributes

Length

( 1X )

Continuous None

Width

( 2X )

Continuous None

Thickness

( 3X )

Continuous None

Volume

( 4X )

Continuous NoneBody

Type

( 5X )

Discrete

Block body

51( )X

Flip body

52( )X

Slide body

53( )X

Type

( 6X )

Discrete

61( )X 62( )X 63( )X

Fun

ctionbutton

Style

( 7X )

Discrete

Round

71( )X

Square

72( )X 73( )X

Shape

( 8X )

Discrete

Circular

81( )X

Regular

82( )X

Asymmetric

83( )X

Arrangement

( 9X )

Discrete

Square

91( )X

Vertical

92( )X

Horizontal

93( )X

Numberbutton

Detail treatment

( 10X )

Discrete

101( )X 102( )X 103( )X 104( )X


12/23

12

Position

( 11X )

Discrete

Middle

111( )X

Upper

112( )X

Lower

113( )X

Full

114( )X Pane

l

Shape

( 12X )

Discrete

Square

121( )X

Fillet

122( )X

Shield

123( )X

Round

123( )X


13/23

13

4.2. Training effect of different kernel functions

The training effect of polynomial kernel and Gaussian kernel are investigated

with the whole product samples. Average training accuracies of kernel functions and

the corresponding parameters are shown in Figure 3.

For polynomial kernel in Figure 3(a), the average error rates of linear kernel for

all parameter C were all larger than 40%. When 2p , the regulating effect of

parameter C was most obvious. As the decrease of parameter C from 1000 to

0.001, the average error rate increased from 0% to 34.8%. This is due to the parameter

C can adjust the margin of optimal hyperplane. Since training with smaller C will

result in larger margin, training error can also be increased. The parameter C had

similar regulating effect when 3p . However, the training error rate increased more

drastically than 2p . Although the training accuracies of polynomial kernel ( 1p )were all superior to linear kernel, they might suffer from the problem of overfitting

and had poor generalization ability.

For Gaussian kernel in Figure 3(b), the regulating effect of parameter C was

less pronounced than polynomial kernel for all kernel parameters . It has been

reported in (Wang, Xu et al. 2003) that too large and too small value of both lead

to poor generalization performance. Our results exhibited similar effects of . For

larger value of , such as 2 10 , all training data were regarded as one data. In a

consequence, the training model cannot recognize new data and the training error rateis very high. On the other hand, for smaller value of , such as 2 10 , all training

data were regarded as support vectors, and they can be separated correctly. The

training error rate declined extremely. However, for untrained data, the training model

may not given good result due to overfitting problem.

In general, the linear kernel performed worse than nonlinear kernels. The

polynomial kernel and Gaussian kernel are capable to nonlinearly map the training

samples into higher dimensional space unlike linear kernel, thus they can handle the

case when the relation between product form features and class labels is nonlinear.

Since every single kernel function has different properties and generalization

performance, the advantages of different kernel functions can be combined by using

their mixtures (Smits and Jordaan 2002). In addition, there exist some theorems which

can help to build kernel functions that take into consideration the domain knowledge

(Barzilay and Brailovsky 1999), these issues are beyond the scope of this paper.


14/23

14

(a)

(b)

Figure 3. Average training accuracies using (a) polynomial kernel and (b) Gaussian

kernel.


15/23

15

4.3. Analysis of cross-validation process

In order to obtain best performance and reduce the overfitting of the training

model, a two-step cross validation process was used to determine optimal parameters.

Figure 4 shows the results of cross-validation for polynomial kernel. Best parameter

set 0 0( , )C obtained from first step of coarse grid search was (100,1) by choosing

the lowest error rate of 71%. The optimal pair of parameter ( , )C obtained in

second step of fine grid search was (800,1) . The average error rate of second step

improved slightly to 68.1%. The results of cross-validation for Gaussian kernel are

shown in Figure 5. Best parameter set 20 0( , )C obtained from coarse grid search was

(10,10) . Optimal parameter set 2( , )C obtained in fine grid search was (40,4) . The

training error rate also improved slightly from 73.9% in first step to 72.4% in second

step.

As shown in previous section, if the training model is built with whole data

samples and selecting one of the parameter set from the region with very low average

error rates ( 10% ) in Figure 3, the training model can hardly get rid of the overfitting

problem. An interesting result shows that the best parameter set obtained by cross

validation of both kernel functions seems to lay on the boundary of the region with

very low average error rates. This indicates that the process of cross-validation is

capable to balance the trade off between improving training accuracy and prevent

overfitting. Since the purpose of cross-validation is to search the best combination ofparameters, the accuracy of the individual training model in this process is not we

concern, regardless for their high training error rates (all larger than 65%). Each of the

optimal parameters of polynomial kernel and Gaussian kernel obtained from

cross-validation were then used to build the final training model.


16/23

16

(a)

(b)

Figure 4. Average training accuracy of cross-validation in (a) coarse grid and (b) fine

grid using polynomial kernel.


17/23

17

(a)

(b)

Figure 5. Average training accuracy of cross-validation in (a) coarse grid and (b) fine

grid using Gaussian kernel.


18/23

18

4.4. Performance of the optimal training model

The best parameter set of polynomial kernel and Gaussian kernel obtained from

the cross-validation process are both used to build the multiclass fuzzy SVM training

model. The average accuracy rate of the polynomial kernel model with

( , ) (800,1)C was 66.3%, while average accuracy rate of the Gaussian kernel model

with 2( , ) (40,4)C was 98.6%. The confusion matrices are used for further

analysis as shown in Table 3. Diagonal elements are the number of correctly classified

samples while off-diagonal elements indicate the number of misclassified samples.

For the polynomial kernel model in Table 3(a), the most confusing was the female

class. More than half of female samples were misclassified as plain, sports, and

simplicity class and the accuracy was down to 20%. According to our observation,

two characteristics of female product sample are the area of decoration and the

color of body. Since the color and texture of the product samples are ignored, these

samples may not provide enough information for polynomial kernel to correctly

classify them. For the Gaussian kernel model in Table 3(b), the model performed very

well and had only one misclassified sample. The performance of the Gaussian kernel

model with parameter set 2( , ) (40,4)C was better than polynomial kernel model.

Table 3. Confusion matrices and accuracy rate of the optimal training model obtained

from (a) polynomial kernel and (b) Gaussian kernel.(a) Predicted Class

plain sports female simplicity business

Accuracy

rate (%)

plain 10 3 0 1 0 71.4

sports 1 15 0 1 0 88.2

female 1 5 2 2 0 20.0

simplicity 1 0 0 15 1 88.2ActualClass

business 1 1 0 2 7 63.6

Average accuracy rate 66.3

(b) Predicted Class

plain sports female simplicity business

Accuracy

rate (%)

plain 13 0 1 0 0 92.9

sports 0 17 0 0 0 100.0

female 0 0 10 0 0 100.0

simplicity 0 0 0 17 0 100.0ActualClass

business 0 0 0 0 11 100.0

Average accuracy rate 98.6


19/23

19

5. Conclusion and future works

In this paper, an approach based on multiclass fuzzy SVM is proposed to develop

a prediction model of consumers preferences. The OVO multiclass fuzzy SVM

model can deal with the nonlinear relationship between product form features by

introducing kernel function. The optimal training parameters were determined by a

two-step cross-validation process. According to the experimental results of mobile

phone design, the optimal training model was obtained by choosing the Gaussian

kernel model with lowest average error rates 72.4% of cross-validation. The

parameter set 2( , )C of the optimal training model was (40,4) . The optimal

Gaussian kernel model training with all product samples also had very high accuracy

of 98.6%. In a consequence, the Gaussian kernel model is superior to the polynomial

model. The result is consistent with the fact that Gaussian kernel is popular and

commonly used in many applications due to its good features. For further discussions

of the properties of Gaussian kernel can be found in (Sathiya and Lin 2003; Wang, Xu

et al. 2003).

Since our case study was developed based on mobile phone design and used

relative small amount of product form features, the form features of different product

samples such as consumer electronics, furniture, car design etc. may have different

characteristics to consider with. A more comprehensive collection of different productsamples is needed to study the effectiveness of the proposed multiclass fuzzy SVM

model. Extending standard kernel functions such as polynomial kernel and Gaussian

kernel by considering the characteristics of product form features is also a very

interesting issue and requires further study.

Appendix: Fuzzy support vector machines

For a binary classification problem, a set S of l training samples, each

represented are given as ( , , )i i ix y where ix is the feature vector, iy is the class

label, and i is the fuzzy membership. Each training sample belongs to either of two

classes. These samples are given a label { 1, 1}iy , a fuzzy membership 1i

with 1,...,i l , and sufficient small 0 . The data samples with 0i means

nothing and can be removed from training set without affecting the result. These

training samples can be used to build a decision function (or discriminant function)

( )D x , which is a scalar function of an input sample. Decision functions are simple

weighted sums of the training samples ix plus a bias are called linear discriminant

functions (Duda and Hart 1973), denoted as


20/23

20

( )D x w x b (1)

where w is the weight vector and b is a bias value. ( )D x can be seen as a

hyperplane; w is the normal vector of the separating plane the bias term b is the

offset of the hyperplane along its normal vector. A data set is said to be linearly

separable if a linear discriminant function can separate it without error. In most cases,

finding a suitable linear discriminant function is too restrictive to be of practical use.

A solution to this situation is mapping the original input space into a higher dimension

feature space and searching the optimal hyperplane in this feature space. Let

( )i iz x denote the corresponding feature space vector with a mapping function

from NR to a feature space Z. The hyperplane can be defined as

0w z b (2)

The set S is said to be linearly separable if there exists ( , )w b such that the

inequalities

1 1i iw z b y =

1i iw z b y = 1 (3)

are valid for all data samples of the set S . For the linearly separable set S , aunique optimal hyperplane can be found for which the margin between the projections

of the training points of the two classes is maximized. To deal with data that are not

linearly separable, the previous analysis can be generalized by introducing some

non-negative variables 0i such that Eq. (3) is modified to

( ) 1 , 1,...,i i i y w z b i l (4)

the non-zero i in Eq. (4) are those for which the data samples ix does not satisfy

Eq. (3). Thus the term1

l

i

i

can be thought of as some measure of the amount of

misclassifications. Since the fuzzy membership i is the attitude of the

corresponding sample ix toward one class and the parameter i is the measure of

error in the SVM, the term i i is a measure of error with different weighting. The

optimal hyperplane problem is then regarded as the solution to

2

1

1minimize

2

l

i i

i

w C

subject to ( ) 1 , 1,...,i i i y w z b i l ,


21/23

21

0 1,...,i i l (5)

where C is a constant. The parameter C can be regarded as a regulation parameter.

Tuning this parameter can make balance between the minimization of the error

function and the maximization of the margin of the optimal hyperplane. A larger C

makes the training of SVMs less misclassifications and narrower margin. The

decrease of C makes SVMs ignore more training points and get a wider margin. It is

noted that a smaller i reduces the effect of the parameter i such that the

corresponding point ix is treated as less important. The optimization problem (5) can

be solved by introducing Lagrange multiplier and transformed into:

1 1 1

1minimize ( ) ( )

2

l l l

i j i j i j i

i j i

W y y z z

1

subject to 0, 0 , 1,...,l

i i i i

i

y C i l

(6)

and the Kuhn-Tucker conditions are defined as

( ( ) 1 ) 0, 1,...,i i i i y w z b i l (7)

( ) 0, 1,...,i i iC i l (8)

The data sample ix with the corresponding 0i is called a support vector.

There are two types of support vector. The one with corresponding 0 i iC lies

on the margin of the hyperplane. The one with corresponding i iC is

misclassified. An important difference between SVM and fuzzy SVM is that the point

with the same value of i may indicate a different type of support vectors in fuzzy

SVM due to the factor i (Lin and Wang 2002). The mapping is usually

nonlinear and unknown. Instead of calculating , the kernel function K is used to

compute the inner product of two vectors in the feature space Z and thus implicitly

defines the mapping function, which is

( , ) ( ) ( )i j i j i jK x x x x z z (9)

Kernel is one of the core concepts in SVMs and plays a very important role. The

following are three types of commonly used kernel functions:

linear kernel: ( , )i j i jK x x x x (10)

polynomial kernel: ( , ) (1 )

p

i j i jK x x x x

(11)


22/23

22

2 2Gaussian kernel: ( , ) exp( / 2 )i j i jK x x x x (12)

where the order p of polynomial kernel in Eq. (11) and the spread width of

Gaussian kernel in Eq. (12) are adjustable kernel parameters. The weight vector w

and the decision function can be expressed by using the Lagrange multiplier i :

1

l

i i i

i

w y z

(13)

1

( ) ( ) ( ( , ) )l

i i i

i

D x sign w z b sign y K x x b

(14)

References

Abe, S. and T. Inoue (2002). Fuzzy support vector machines for multiclass problems.

European Symposium on Artificial Neural Networks Bruges, Belgium.

Barzilay, O. and V. L. Brailovsky (1999). "On domain knowledge and feature

selection using a support vector machine." Pattern Recognition Letters 20:

475-484.

Berrar, D., I. Bradbury, et al. (2006). "Avoiding model selection bias in small-sample

genomic datasets." Bioinformatics 22(10): 1245-1250.

Bottou, L., C. Cortes, et al. (1994). Comparison of classifier methods: a case study in

handwriting digit recognition. International Conference on PatternRecognition, IEEE Computer Society Press.

Burges, C. (1998). "A tutorial on support vector machines for pattern recognition."

Data Mining and Knowledge Discovery 2(2).

Duan, K.-B. and S. S. Keerthi (2005). "Which is the best multiclass SVM method? An

empirical study." Multiple Classifier Systems: 278-285.

Duda, R. O. and P. E. Hart (1973). Pattern classification and scene analysis, Wiley.

Hsu, C.-W., C.-C. Chang, et al. (2003). A practical guide to support vector

classification.

Hsu, C.-W. and C.-J. Lin (2001). "A comparison of methods for multi-class support

vector machines." IEEE Transactions on Neural Networks 13: 415-425.

Hsu, S. H., M. C. Chuang, et al. (2000). "A semantic differential study of designers'

and users' product form perception." International Journal of Industrial

Ergonomics 25: 375-391.

Huang, H.-P. and Y.-H. Liu (2002). "Fuzzy support vector machines for pattern

recognition and data mining." International Journal of Fuzzy Systems 4(3):

826-835.

Inoue, T. and S. Abe (2001). Fuzzy support vector machines for pattern classification.


23/23

23

Jindo, T., K. Hirasago, et al. (1995). "Development of a design support system for

office chairs using 3-D graphics." International Journal of Industrial

Ergonomics 15: 49-62.

Kaul, A. and V. R. Rao (1995). "Research for product positioning and design

decisions." International Journal of Research in Marketing 12: 293-320.

Krebel, U. (1999). Pairwise classification and support vector machines. Advances in

Kernel Methods - Support Vector Learning. B. Scholkopf, J. C. Burges and A.

J. Smola. Cambridge, MA, MIT Press: 255-268.

Kwahk, J. and S. H. Han (2002). "A methodology for evaluating the usability of

audiovisual consumer electronic products." Applied Ergonomics 33: 419-431.

Lin, C.-F. and S.-D. Wang (2002). "Fuzzy support vector machines." IEEE

Transactions on Neural Networks 13(2): 464-471.

Park, J. and S. H. Han (2004). "A fuzzy rule-based approach to modeling affective

user satisfaction towards office chair design." International Journal of

Industrial Ergonomics 34: 31-47.

Platt, J. C., N. Cristianini, et al. (2000). Large margin DAGs for multiclass

classification. Advances in Neural Information Processing Systems, MIT

Press.

Sathiya, S. and C.-J. Lin (2003). "Asymptotic behaviors of support vector machines

with Gaussian kernel." Neural Computation 15(7): 1667-1689.

Scholkopf, B., I. Guyon, et al. (2001). Statistical learning and kernel methods inbioinformatics, San Miniato.

Shimizu, Y. and T. Jindo (1995). "A fuzzy logic analysis method for evaluating human

sensitivities." International Journal of Industrial Ergonomics 15: 39-47.

Smits, G. F. and E. M. Jordaan (2002). Improved SVM regression using mixtures of

kernels. Proceedings of IJCNN'02 on Neural Networks.

Wang, W., Z. Xu, et al. (2003). "Determination of the spread parameter in the

Gaussian kernel for classification and regression." Neurocomputing 55:

643-663.

Documents

Consumers’ Preferences Modeling With Multiclass Fuzzy Support Vector Machines