Consumers’ Preferences Modeling With Multiclass Fuzzy Support Vector Machines

Embed Size (px)

Citation preview

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    1/23

    1

    Consumers Preferences Modeling With Multiclass Fuzzy Support Vector

    Machines

    Chih-Chieh Yang

    Department of Multimedia and Entertainment Science, Southern Taiwan University

    No. 1, Nantai Street, Yongkang City, Tainan County, Taiwan 71005

    Meng-Dar Shieh

    Department of Industrial Design, National Cheng Kung University, Tainan, Taiwan

    70101

    Abstract

    Consumers preferences toward product design are often affected by a large

    number of form features. It is very important for product designers to understand the

    relationship between consumers preferences and product form features. In this paper,

    an approach based on multiclass fuzzy support vector machines (multiclass fuzzy

    SVM) is proposed to construct the prediction model of consumers preferences.

    Product samples are collected and their form features are systematically examined.

    Each product sample is assigned a class label and a fuzzy membership agreeing this

    label to formulate a multiclass classification problem. A one-versus-one multiclassfuzzy SVM model is constructed using collected product samples. Optimal training

    parameter set of model is determined by a two-step cross-validation. A case study of

    mobile phone design is also given to demonstrate the effectiveness of proposed

    methodology. Two standard kernel functions including polynomial kernel and

    Gaussian kernel are used and compared their performance. The experiment results

    show that the performance of Gaussian kernel model is better than polynomial model.

    The Gaussian model performed very well and is capable to prevent the overfitting

    problem.

    Keywords: Consumers preferences; Multiclass fuzzy support vector machines;

    Mobile phone design

    1. Introduction

    The appearance of product is one of the most important factors affecting

    consumers purchase decision. Traditionally, the quality of product form design

    heavily depends on designers intuitions and not proves to gain success in the

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    2/23

    2

    marketplace. In order to understand the consumers preferences and develop appealing

    product in a more effective manner, many researches have been done to study product

    form design in systematic approaches. The most noticeable research is Kansei

    Engineering proposed by (Jindo, Hirasago et al. 1995). The main issue is how to deal

    with the inter-attribute correlations between attributes and take care of the nonlinear

    property of attributes (Shimizu and Jindo 1995; Park and Han 2004). The mostly

    adapted techniques in product design field such as multiple regression analysis (Park

    and Han 2004) or multivariate analysis (Shimizu and Jindo 1995) heavily dependent

    on the assumption of impendence and linearity hence can not deal with the

    nonlinearity of the relationship effectively. In addition, prior to establish mathematical

    model, data simplification and variable screening is often needed to obtain better

    results (Hsu, Chuang et al. 2000). Fuzzy regression analysis (Shimizu and Jindo 1995)

    and other method suffer the same shortcomings (Park and Han 2004).

    (Vapnik 1995) developed a new kind of algorithm called support vector machine

    (SVM). SVM has been shown to provide higher performance than traditional learning

    techniques (Burges 1998). SVMs remarkable and robust performance with respect to

    sparse and noisy data makes them first choice in a number of applications such as

    pattern recognition (Burges 1998) and bioinformatics (Scholkopf, Guyon et al. 2001).

    SVM is known as its elegance in solving the nonlinear problem with the technique of

    kernels that automatically do a nonlinear mapping to a feature space. In a

    consequence, the nonlinear relationship between product form features can beprocessed effectively by introducing suitable kernel function.

    This study proposed an approach based on multiclass fuzzy SVM for consumers

    preferences modeling. This approach begins with processing product form with

    discrete and continuous attributes and can also deal with sparse feature vectors. Each

    product sample is assigned a class label and a fuzzy membership used to describe the

    semantic differential score that agrees this label. A one-versus-one multiclass fuzzy

    SVM model is constructed using collected product samples. Optimal training

    parameter set of model is determined by a two-step cross-validation. The reminder of

    the paper is organized as follows. Section 2 gives an introduction of multiclass fuzzy

    SVM. Section 3 presents the proposed prediction model of consumers preferences.

    Section 4 demonstrates the experimental results of the proposed model using mobile

    phone design as example. Finally, Section 5 presents some brief conclusions and

    suggestions for future work.

    2. Multiclass fuzzy support vector machines

    2.1. Fuzzy support vector machines for binary classification

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    3/23

    3

    An SVM maps the input points into high-dimensional feature space and finds a

    separating hyperplane that maximizes the margin between two classes in the space.

    Maximizing the margin is a quadratic programming (QP) problem can be solved from

    its dual problem by introducing Lagrangian multipliers. Without any knowledge of the

    mapping, the SVM finds the optimal hyperplane by using the dot product functions in

    feature space via the aid of kernels. The solution of the optimal hyperplane can be

    written as a combination of a few input points that are called support vectors.

    In many real-world applications, input samples may not be exactly assigned to

    one class and the effects of the training samples might be different. Some are more

    important to be fully assigned to one class so that SVM can separate these samples

    more correctly. Some samples might be noisy and less meaningful and should discard

    them. Equally treating every data samples may cause unsuitable overfitting problem.

    The original SVM lacks this kind of ability. (Huang and Liu 2002; Lin and Wang

    2002) proposed the concept of fuzzy SVM which combines fuzzy logic and SVM to

    make different training samples have different contributions to their own class. The

    central of their concept is to fuzzify the training set and assign each data sample a

    membership value according to its relative importance in the class. A description of

    fuzzy SVM is given in the Appendix.

    Figure 1 illustrated a simplified binary classification problem with only two

    attributes training by fuzzy SVM using linear kernel. Since all data samples only havetwo attributes, the data point can be plotted in 2D plane and explain the training

    results in a more intuitive manner. Red and blue disks are two classes of training

    samples. Grey values indicate the value of the argument1

    ( , )l

    i i i

    i

    y K x x b

    of Eq. (20)

    in the Appendix. Then new data sample without given class label can be discriminated

    according to Eq. (3) in the Appendix. In Figure 1 the middle solid line is the decision

    surface and data points lie on this surface satisfy ( ) 0D x . The outer dash lines

    precisely meet the constraint in Eq. (6) in the Appendix and data points lie on this

    margin satisfy ( ) 1D x or ( ) 1D x .

    In addition, support vectors are very useful in data analysis and interpretation. In

    the original definition of SVM, the data points satisfying the condition 0i are

    called a support vector. In fuzzy SVM, the same value of i may indicate a different

    type of support vectors due to the factor i (Lin and Wang 2002). The one with

    corresponding i iC is misclassified. The one with corresponding 0 i iC

    lies on the margin of the hyperplane (marked by extra circles in Figure 1).

    Different fuzzy memberships to different classes were applied to demonstrate the

    training effect of i . InFigure 1(a) both classes are assigned membership equal to 1,

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    4/23

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    5/23

    5

    C = 10000 C = 1000

    C = 100 C = 10

    C = 1 C = 0.1

    Figure 2. The relationship between parameter C and the training margin

    (polynomial kernel with degree 2).

    2.2 One-versus-one multiclass support vector machines

    In previous section we have described the concept of fuzzy SVM. However,

    fuzzy SVM is still limited in binary classification. How to effectively extend SVM for

    multiclass classification is still an on-going research issue. (Hsu and Lin 2001; Duan

    and Keerthi 2005) compared the performance of several multi-class SVM methods

    based on binary SVM, including one-versus-rest (OVR) (Bottou, Cortes et al. 1994),

    one-versus-one (OVO) (Krebel 1999), and directed acyclic graph SVM (DAGSVM)

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    6/23

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    7/23

    7

    Two characteristics of product form features are considered in this study. Firstly,

    the form feature vector is often sparse. This is mainly because there often exist large

    amounts of features to represent product form design, and each product sample is not

    necessarily occupied all form features. The number of active or non-zero features in a

    feature vector is lower than the total number of features. This situation is very

    common in product form feature representation (Kwahk and Han 2002). Secondly,

    product form features are often mixed with two kinds of attributes denoted as

    discrete or continuous type. Discrete attributes denote categorical choices among

    fixed number of variables, such as types of texture, material used in parts etc.

    Continuous attributes such as length, and proportion often have some kind of scale or

    can be measured and the domain of variable is continuous without interruption. SVM

    can deal with mixed attribute of discrete and continuous types at the same time. Since

    SVM requires that each data sample be represented as a vector of real numbers,

    discrete attributes can be represented as integer number. Taking a three-category

    attribute circle, rectangle, triangle for example, it can be coded as {1,2,3} . As for

    continuous attributes, because kernel values usually depend on the inner products of

    feature vectors, e.g. linear kernel and polynomial kernel, large attribute values might

    cause numerical problems (Hsu, Chang et al. 2003). Continuous attributes are linearly

    scaled to the range [0, 1] to avoid numerical difficulties during calculation.

    3.2. Describing consumers preferences using class labels

    The concept of product positioning was borrowed describe the consumers

    preferences toward product form design. (Kaul and Rao 1995) suggested that a

    company should provide an array of products into the marketplace in order to the

    needs of each homogenous consumer segment. Vice versa, consumers often make

    choices in the marketplace according to the perceived product attributes. Base on this

    idea, product samples are assumed to be distinguished by consumers and classified

    into different groups. The managerial decisions can be made more effectively by

    identifying the relative importance attached to various product attributes. Take the

    mobile phone design for example, class labels such as sports, simplicity, female, plain

    and business etc. are used to describe different product divisions provided in the

    marketplace. Although there exist other product characteristics may affect consumers

    subjective perceptions (brand, price, etc.), the authors mainly emphasize on the

    factors only in product form design. Other marketing strategies which may influence

    the decision of consumers are beyond the scope of this study.

    3.3. Collecting product samples

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    8/23

    8

    A total of 69 mobile phones were collected from the Taiwan market in 2006.

    Three product designers each with at least 5 years experiences conducted the product

    form features analysis. They first examined the main component structure using the

    method proposed in (Kwahk and Han 2002) and then used this structure to analyzes

    all product samples. Form features of each product sample were discussed by all

    designers and determine one unified representation. Continuous attributes were

    recorded directly while discrete attributes were processed by the method described in

    Section 3.1. The color and texture information of the product samples were ignored

    and emphasized on the form features only. All entities in the feature matrix are

    prepared for training multiclass fuzzy SVM. Five class labels such as sports,

    simplicity, female, plain and business are chosen for semantic evaluations. In order to

    collect consumers perception data for mobile phone design, 30 subjects, including 15

    males and 15 females, were asked to evaluate all product samples using the selected

    five class labels. Each subject was asked to choose the most suitable class label for

    resenting each product sample, and evaluates each sample in a semantic differential

    scale from 0 (very low) to 1 (very high). Since each product sample only had single

    instance when training the multiclass fuzzy SVM model, the label with most

    frequently assigned label were used for representing each product sample. Training

    multiple instances of samples is another interesting issue worth of further research.

    The selected class label is assigned as +1, and rest of the labels is assigned as 1. Thesemantic differential score is directly stored as the membership value for fuzzy SVM

    training.

    3.4. Constructing multiclass fuzzy SVM model

    In this study, each product sample is assigned a class label to formulate a

    multiclass classification problem. This problem is then divided into a series of OVO

    SVM sub-problems. The objective of multiclass classification is to correctly

    discriminating these classes from each other and each OVO problem is addressed by a

    two different class labels (e.g. sports versus simplicity). Each classifier uses the fuzzy

    SVM to define a hyperplane that best separates product samples into two classes.

    Each test sample is sequentially presented to each of the 5 (5 1) / 2 10 OVO

    classifiers and can be predicted which label it belong to, based on the OVO classifier

    having the largest vote.

    3.5. Choosing optimal parameters using cross-validation

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    9/23

    9

    Since the number of product samples is limited, it is important to obtain best

    generalization performance and reduce the overfitting problem. Practical

    implementation is to partition these data samples into training data and testing data.

    Various partition strategies have been proposed including leave-one-out

    cross-validation (LOOCV), k-fold cross-validation, repeated random subsampling,

    and bootstrapping (Berrar, Bradbury et al. 2006). In this study, 5-fold cross-validation

    is used to choose optimal parameters. The whole training samples are randomly

    divided into five subsets of approximately equal size. Each multiclass model is trained

    using 5 1 4 subsets and tested using the remaining subset. Training is repeated

    five times and the average testing error rates for all the five subset that are not

    included in the training data is calculated.

    The performance of SVM model is heavily dependent on the regulation

    parameter C and the parameter of chosen kernel function. Take the Gaussian kernel

    for example, each binary classifier requires the selection of two parameters, which are

    the regularization parameter C and kernel parameter 2 . C and 2 of each

    classifier within the multiclass model are set to be the same for calculation efficiency.

    Since cross validation may be very time-consuming, a two-step grid search is

    conducted to find the optimal hyperparameter pair (Hsu, Chang et al. 2003). In the

    first step, a coarse grid search is taken using the following sets of values:

    3 310 , ...,10C and 2 3 310 , ...,10 . Thus 49 combinations of C and 2 are

    tried in this step. An optimal pair 20 0( , )C is selected from the coarse grid search. In

    the second step, a fine grid search is conducted around 20 0( , )C , where

    0 0 0 0 0 0 00.2 ,0.4 ,...,0.8 , ,2 ,4 ,...,8C C C C C C C C , and

    2 2 2 2 2 2 2 20 0 0 0 0 0 00.2 ,0.4 ,...,0.8 , ,2 ,4 ,...,8 .

    All together, 81 combinations of C and 2 are tried in this step. The optimal

    hyperparameter pair is selected from this fine search. Likewise, the same two-step

    grid search is repeated on polynomial kernel. For polynomial kernel, the coarse grid

    of polynomial is taken as 3 310 , ...,10C and 1,2,...,5 . When 0 0( , )C is

    determined, the range of the fine grid search is as

    0 0 0 0 0 0 00.2 ,0.4 ,...,0.8 , ,2 ,4 ,...,8C C C C C C C C , and

    0 0 0 0 0 0 00.2 ,0.4 ,...,0.8 , ,1.2 ,1.4 ,...,1.8 .

    After comparing the performance of all training models using different kernel

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    10/23

    10

    functions and parameters, the best combination of parameters obtained by

    cross-validation is used to build the multiclass fuzzy SVM model.

    4. Experimental results

    4.1. Data set

    Mobile phone design has been selected to demonstrate the proposed

    methodology. Table 1 shows a part of product samples used in this study. The set of

    ( 1,2, ...,10)iS i represents a part of product samples to be analyzed; the set of

    ( 1, 2,..., 6)iX i denotes the product form feature attributes; and the set of

    ( 1, 2,..., 5)iY i represents the class labels; i is the membership value of the +1 class

    label of each product sample iS . For the sake of simplicity only six product form

    features are listed in the example of Table 1.

    1 2 3 4 5 6{ , , , , , }X X X X X X X

    {body-length,body-width,body-thickness,body-volume,

    body-type,function-button-type}

    Five class labels are used to describe consumers subjective perceptions of mobile

    phone design. These class labels list as following:

    1 2 3 4 5{ , , , , }Y Y Y Y Y Y

    {sports, simplicity, female, plain, business}

    Take product sample 1S as example, the consumer choose label 4Y plain and

    the attitude of 4Y is 0.5 . A complete list of all product form features is shown in

    Table 2.

    Table 1. Part of training product samples for mobile phone design.

    Product samples

    ( S )

    Product form features

    ( X )

    Class labels and membership values

    ( ,Y )

    1X 2X 3X 4X 5X 6X 1Y 2Y 3Y 4Y 5Y

    1S 0.75 0.45 0.72 0.62 2 3 -1 -1 -1 +1 -1 0.5

    2S 0.67 0.43 0.64 0.47 3 3 -1 -1 +1 -1 -1 0.8

    3S 0.79 0.42 0.57 0.48 1 3 +1 -1 -1 -1 -1 0.5

    4S 0.75 0.44 0.6 0.5 3 3 -1 -1 -1 +1 -1 0.6

    5S 0.67 0.42 0.77 0.54 2 2 -1 -1 +1 -1 -1 0.6

    6S 0.72 0.48 0.53 0.47 2 3 -1 -1 -1 +1 -1 0.9

    7S 1 0.44 0.56 0.63 1 1 -1 -1 +1 -1 -1 0.7

    8S 0.77 0.45 0.81 0.71 2 1 -1 -1 +1 -1 -1 1

    9S 0.75 0.45 0.72 0.62 2 3 -1 -1 +1 -1 -1 0.6

    10S 0.67 0.43 0.64 0.47 3 3 -1 -1 +1 -1 -1 0.8

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    11/23

    11

    Table 2. Complete list of product form features used in this study.

    Form features Type Attributes

    Length

    ( 1X )

    Continuous None

    Width

    ( 2X )

    Continuous None

    Thickness

    ( 3X )

    Continuous None

    Volume

    ( 4X )

    Continuous NoneBody

    Type

    ( 5X )

    Discrete

    Block body

    51( )X

    Flip body

    52( )X

    Slide body

    53( )X

    Type

    ( 6X )

    Discrete

    61( )X 62( )X 63( )X

    Fun

    ctionbutton

    Style

    ( 7X )

    Discrete

    Round

    71( )X

    Square

    72( )X 73( )X

    Shape

    ( 8X )

    Discrete

    Circular

    81( )X

    Regular

    82( )X

    Asymmetric

    83( )X

    Arrangement

    ( 9X )

    Discrete

    Square

    91( )X

    Vertical

    92( )X

    Horizontal

    93( )X

    Numberbutton

    Detail treatment

    ( 10X )

    Discrete

    101( )X 102( )X 103( )X 104( )X

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    12/23

    12

    Position

    ( 11X )

    Discrete

    Middle

    111( )X

    Upper

    112( )X

    Lower

    113( )X

    Full

    114( )X Pane

    l

    Shape

    ( 12X )

    Discrete

    Square

    121( )X

    Fillet

    122( )X

    Shield

    123( )X

    Round

    123( )X

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    13/23

    13

    4.2. Training effect of different kernel functions

    The training effect of polynomial kernel and Gaussian kernel are investigated

    with the whole product samples. Average training accuracies of kernel functions and

    the corresponding parameters are shown in Figure 3.

    For polynomial kernel in Figure 3(a), the average error rates of linear kernel for

    all parameter C were all larger than 40%. When 2p , the regulating effect of

    parameter C was most obvious. As the decrease of parameter C from 1000 to

    0.001, the average error rate increased from 0% to 34.8%. This is due to the parameter

    C can adjust the margin of optimal hyperplane. Since training with smaller C will

    result in larger margin, training error can also be increased. The parameter C had

    similar regulating effect when 3p . However, the training error rate increased more

    drastically than 2p . Although the training accuracies of polynomial kernel ( 1p )were all superior to linear kernel, they might suffer from the problem of overfitting

    and had poor generalization ability.

    For Gaussian kernel in Figure 3(b), the regulating effect of parameter C was

    less pronounced than polynomial kernel for all kernel parameters . It has been

    reported in (Wang, Xu et al. 2003) that too large and too small value of both lead

    to poor generalization performance. Our results exhibited similar effects of . For

    larger value of , such as 2 10 , all training data were regarded as one data. In a

    consequence, the training model cannot recognize new data and the training error rateis very high. On the other hand, for smaller value of , such as 2 10 , all training

    data were regarded as support vectors, and they can be separated correctly. The

    training error rate declined extremely. However, for untrained data, the training model

    may not given good result due to overfitting problem.

    In general, the linear kernel performed worse than nonlinear kernels. The

    polynomial kernel and Gaussian kernel are capable to nonlinearly map the training

    samples into higher dimensional space unlike linear kernel, thus they can handle the

    case when the relation between product form features and class labels is nonlinear.

    Since every single kernel function has different properties and generalization

    performance, the advantages of different kernel functions can be combined by using

    their mixtures (Smits and Jordaan 2002). In addition, there exist some theorems which

    can help to build kernel functions that take into consideration the domain knowledge

    (Barzilay and Brailovsky 1999), these issues are beyond the scope of this paper.

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    14/23

    14

    (a)

    (b)

    Figure 3. Average training accuracies using (a) polynomial kernel and (b) Gaussian

    kernel.

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    15/23

    15

    4.3. Analysis of cross-validation process

    In order to obtain best performance and reduce the overfitting of the training

    model, a two-step cross validation process was used to determine optimal parameters.

    Figure 4 shows the results of cross-validation for polynomial kernel. Best parameter

    set 0 0( , )C obtained from first step of coarse grid search was (100,1) by choosing

    the lowest error rate of 71%. The optimal pair of parameter ( , )C obtained in

    second step of fine grid search was (800,1) . The average error rate of second step

    improved slightly to 68.1%. The results of cross-validation for Gaussian kernel are

    shown in Figure 5. Best parameter set 20 0( , )C obtained from coarse grid search was

    (10,10) . Optimal parameter set 2( , )C obtained in fine grid search was (40,4) . The

    training error rate also improved slightly from 73.9% in first step to 72.4% in second

    step.

    As shown in previous section, if the training model is built with whole data

    samples and selecting one of the parameter set from the region with very low average

    error rates ( 10% ) in Figure 3, the training model can hardly get rid of the overfitting

    problem. An interesting result shows that the best parameter set obtained by cross

    validation of both kernel functions seems to lay on the boundary of the region with

    very low average error rates. This indicates that the process of cross-validation is

    capable to balance the trade off between improving training accuracy and prevent

    overfitting. Since the purpose of cross-validation is to search the best combination ofparameters, the accuracy of the individual training model in this process is not we

    concern, regardless for their high training error rates (all larger than 65%). Each of the

    optimal parameters of polynomial kernel and Gaussian kernel obtained from

    cross-validation were then used to build the final training model.

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    16/23

    16

    (a)

    (b)

    Figure 4. Average training accuracy of cross-validation in (a) coarse grid and (b) fine

    grid using polynomial kernel.

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    17/23

    17

    (a)

    (b)

    Figure 5. Average training accuracy of cross-validation in (a) coarse grid and (b) fine

    grid using Gaussian kernel.

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    18/23

    18

    4.4. Performance of the optimal training model

    The best parameter set of polynomial kernel and Gaussian kernel obtained from

    the cross-validation process are both used to build the multiclass fuzzy SVM training

    model. The average accuracy rate of the polynomial kernel model with

    ( , ) (800,1)C was 66.3%, while average accuracy rate of the Gaussian kernel model

    with 2( , ) (40,4)C was 98.6%. The confusion matrices are used for further

    analysis as shown in Table 3. Diagonal elements are the number of correctly classified

    samples while off-diagonal elements indicate the number of misclassified samples.

    For the polynomial kernel model in Table 3(a), the most confusing was the female

    class. More than half of female samples were misclassified as plain, sports, and

    simplicity class and the accuracy was down to 20%. According to our observation,

    two characteristics of female product sample are the area of decoration and the

    color of body. Since the color and texture of the product samples are ignored, these

    samples may not provide enough information for polynomial kernel to correctly

    classify them. For the Gaussian kernel model in Table 3(b), the model performed very

    well and had only one misclassified sample. The performance of the Gaussian kernel

    model with parameter set 2( , ) (40,4)C was better than polynomial kernel model.

    Table 3. Confusion matrices and accuracy rate of the optimal training model obtained

    from (a) polynomial kernel and (b) Gaussian kernel.(a) Predicted Class

    plain sports female simplicity business

    Accuracy

    rate (%)

    plain 10 3 0 1 0 71.4

    sports 1 15 0 1 0 88.2

    female 1 5 2 2 0 20.0

    simplicity 1 0 0 15 1 88.2ActualClass

    business 1 1 0 2 7 63.6

    Average accuracy rate 66.3

    (b) Predicted Class

    plain sports female simplicity business

    Accuracy

    rate (%)

    plain 13 0 1 0 0 92.9

    sports 0 17 0 0 0 100.0

    female 0 0 10 0 0 100.0

    simplicity 0 0 0 17 0 100.0ActualClass

    business 0 0 0 0 11 100.0

    Average accuracy rate 98.6

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    19/23

    19

    5. Conclusion and future works

    In this paper, an approach based on multiclass fuzzy SVM is proposed to develop

    a prediction model of consumers preferences. The OVO multiclass fuzzy SVM

    model can deal with the nonlinear relationship between product form features by

    introducing kernel function. The optimal training parameters were determined by a

    two-step cross-validation process. According to the experimental results of mobile

    phone design, the optimal training model was obtained by choosing the Gaussian

    kernel model with lowest average error rates 72.4% of cross-validation. The

    parameter set 2( , )C of the optimal training model was (40,4) . The optimal

    Gaussian kernel model training with all product samples also had very high accuracy

    of 98.6%. In a consequence, the Gaussian kernel model is superior to the polynomial

    model. The result is consistent with the fact that Gaussian kernel is popular and

    commonly used in many applications due to its good features. For further discussions

    of the properties of Gaussian kernel can be found in (Sathiya and Lin 2003; Wang, Xu

    et al. 2003).

    Since our case study was developed based on mobile phone design and used

    relative small amount of product form features, the form features of different product

    samples such as consumer electronics, furniture, car design etc. may have different

    characteristics to consider with. A more comprehensive collection of different productsamples is needed to study the effectiveness of the proposed multiclass fuzzy SVM

    model. Extending standard kernel functions such as polynomial kernel and Gaussian

    kernel by considering the characteristics of product form features is also a very

    interesting issue and requires further study.

    Appendix: Fuzzy support vector machines

    For a binary classification problem, a set S of l training samples, each

    represented are given as ( , , )i i ix y where ix is the feature vector, iy is the class

    label, and i is the fuzzy membership. Each training sample belongs to either of two

    classes. These samples are given a label { 1, 1}iy , a fuzzy membership 1i

    with 1,...,i l , and sufficient small 0 . The data samples with 0i means

    nothing and can be removed from training set without affecting the result. These

    training samples can be used to build a decision function (or discriminant function)

    ( )D x , which is a scalar function of an input sample. Decision functions are simple

    weighted sums of the training samples ix plus a bias are called linear discriminant

    functions (Duda and Hart 1973), denoted as

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    20/23

    20

    ( )D x w x b (1)

    where w is the weight vector and b is a bias value. ( )D x can be seen as a

    hyperplane; w is the normal vector of the separating plane the bias term b is the

    offset of the hyperplane along its normal vector. A data set is said to be linearly

    separable if a linear discriminant function can separate it without error. In most cases,

    finding a suitable linear discriminant function is too restrictive to be of practical use.

    A solution to this situation is mapping the original input space into a higher dimension

    feature space and searching the optimal hyperplane in this feature space. Let

    ( )i iz x denote the corresponding feature space vector with a mapping function

    from NR to a feature space Z. The hyperplane can be defined as

    0w z b (2)

    The set S is said to be linearly separable if there exists ( , )w b such that the

    inequalities

    1 1i iw z b y =

    1i iw z b y = 1 (3)

    are valid for all data samples of the set S . For the linearly separable set S , aunique optimal hyperplane can be found for which the margin between the projections

    of the training points of the two classes is maximized. To deal with data that are not

    linearly separable, the previous analysis can be generalized by introducing some

    non-negative variables 0i such that Eq. (3) is modified to

    ( ) 1 , 1,...,i i i y w z b i l (4)

    the non-zero i in Eq. (4) are those for which the data samples ix does not satisfy

    Eq. (3). Thus the term1

    l

    i

    i

    can be thought of as some measure of the amount of

    misclassifications. Since the fuzzy membership i is the attitude of the

    corresponding sample ix toward one class and the parameter i is the measure of

    error in the SVM, the term i i is a measure of error with different weighting. The

    optimal hyperplane problem is then regarded as the solution to

    2

    1

    1minimize

    2

    l

    i i

    i

    w C

    subject to ( ) 1 , 1,...,i i i y w z b i l ,

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    21/23

    21

    0 1,...,i i l (5)

    where C is a constant. The parameter C can be regarded as a regulation parameter.

    Tuning this parameter can make balance between the minimization of the error

    function and the maximization of the margin of the optimal hyperplane. A larger C

    makes the training of SVMs less misclassifications and narrower margin. The

    decrease of C makes SVMs ignore more training points and get a wider margin. It is

    noted that a smaller i reduces the effect of the parameter i such that the

    corresponding point ix is treated as less important. The optimization problem (5) can

    be solved by introducing Lagrange multiplier and transformed into:

    1 1 1

    1minimize ( ) ( )

    2

    l l l

    i j i j i j i

    i j i

    W y y z z

    1

    subject to 0, 0 , 1,...,l

    i i i i

    i

    y C i l

    (6)

    and the Kuhn-Tucker conditions are defined as

    ( ( ) 1 ) 0, 1,...,i i i i y w z b i l (7)

    ( ) 0, 1,...,i i iC i l (8)

    The data sample ix with the corresponding 0i is called a support vector.

    There are two types of support vector. The one with corresponding 0 i iC lies

    on the margin of the hyperplane. The one with corresponding i iC is

    misclassified. An important difference between SVM and fuzzy SVM is that the point

    with the same value of i may indicate a different type of support vectors in fuzzy

    SVM due to the factor i (Lin and Wang 2002). The mapping is usually

    nonlinear and unknown. Instead of calculating , the kernel function K is used to

    compute the inner product of two vectors in the feature space Z and thus implicitly

    defines the mapping function, which is

    ( , ) ( ) ( )i j i j i jK x x x x z z (9)

    Kernel is one of the core concepts in SVMs and plays a very important role. The

    following are three types of commonly used kernel functions:

    linear kernel: ( , )i j i jK x x x x (10)

    polynomial kernel: ( , ) (1 )

    p

    i j i jK x x x x

    (11)

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    22/23

    22

    2 2Gaussian kernel: ( , ) exp( / 2 )i j i jK x x x x (12)

    where the order p of polynomial kernel in Eq. (11) and the spread width of

    Gaussian kernel in Eq. (12) are adjustable kernel parameters. The weight vector w

    and the decision function can be expressed by using the Lagrange multiplier i :

    1

    l

    i i i

    i

    w y z

    (13)

    1

    ( ) ( ) ( ( , ) )l

    i i i

    i

    D x sign w z b sign y K x x b

    (14)

    References

    Abe, S. and T. Inoue (2002). Fuzzy support vector machines for multiclass problems.

    European Symposium on Artificial Neural Networks Bruges, Belgium.

    Barzilay, O. and V. L. Brailovsky (1999). "On domain knowledge and feature

    selection using a support vector machine." Pattern Recognition Letters 20:

    475-484.

    Berrar, D., I. Bradbury, et al. (2006). "Avoiding model selection bias in small-sample

    genomic datasets." Bioinformatics 22(10): 1245-1250.

    Bottou, L., C. Cortes, et al. (1994). Comparison of classifier methods: a case study in

    handwriting digit recognition. International Conference on PatternRecognition, IEEE Computer Society Press.

    Burges, C. (1998). "A tutorial on support vector machines for pattern recognition."

    Data Mining and Knowledge Discovery 2(2).

    Duan, K.-B. and S. S. Keerthi (2005). "Which is the best multiclass SVM method? An

    empirical study." Multiple Classifier Systems: 278-285.

    Duda, R. O. and P. E. Hart (1973). Pattern classification and scene analysis, Wiley.

    Hsu, C.-W., C.-C. Chang, et al. (2003). A practical guide to support vector

    classification.

    Hsu, C.-W. and C.-J. Lin (2001). "A comparison of methods for multi-class support

    vector machines." IEEE Transactions on Neural Networks 13: 415-425.

    Hsu, S. H., M. C. Chuang, et al. (2000). "A semantic differential study of designers'

    and users' product form perception." International Journal of Industrial

    Ergonomics 25: 375-391.

    Huang, H.-P. and Y.-H. Liu (2002). "Fuzzy support vector machines for pattern

    recognition and data mining." International Journal of Fuzzy Systems 4(3):

    826-835.

    Inoue, T. and S. Abe (2001). Fuzzy support vector machines for pattern classification.

  • 8/7/2019 Consumers Preferences Modeling With Multiclass Fuzzy Support Vector Machines

    23/23

    23

    Jindo, T., K. Hirasago, et al. (1995). "Development of a design support system for

    office chairs using 3-D graphics." International Journal of Industrial

    Ergonomics 15: 49-62.

    Kaul, A. and V. R. Rao (1995). "Research for product positioning and design

    decisions." International Journal of Research in Marketing 12: 293-320.

    Krebel, U. (1999). Pairwise classification and support vector machines. Advances in

    Kernel Methods - Support Vector Learning. B. Scholkopf, J. C. Burges and A.

    J. Smola. Cambridge, MA, MIT Press: 255-268.

    Kwahk, J. and S. H. Han (2002). "A methodology for evaluating the usability of

    audiovisual consumer electronic products." Applied Ergonomics 33: 419-431.

    Lin, C.-F. and S.-D. Wang (2002). "Fuzzy support vector machines." IEEE

    Transactions on Neural Networks 13(2): 464-471.

    Park, J. and S. H. Han (2004). "A fuzzy rule-based approach to modeling affective

    user satisfaction towards office chair design." International Journal of

    Industrial Ergonomics 34: 31-47.

    Platt, J. C., N. Cristianini, et al. (2000). Large margin DAGs for multiclass

    classification. Advances in Neural Information Processing Systems, MIT

    Press.

    Sathiya, S. and C.-J. Lin (2003). "Asymptotic behaviors of support vector machines

    with Gaussian kernel." Neural Computation 15(7): 1667-1689.

    Scholkopf, B., I. Guyon, et al. (2001). Statistical learning and kernel methods inbioinformatics, San Miniato.

    Shimizu, Y. and T. Jindo (1995). "A fuzzy logic analysis method for evaluating human

    sensitivities." International Journal of Industrial Ergonomics 15: 39-47.

    Smits, G. F. and E. M. Jordaan (2002). Improved SVM regression using mixtures of

    kernels. Proceedings of IJCNN'02 on Neural Networks.

    Wang, W., Z. Xu, et al. (2003). "Determination of the spread parameter in the

    Gaussian kernel for classification and regression." Neurocomputing 55:

    643-663.