10
Research Article Cost-Sensitive Support Vector Machine Using Randomized Dual Coordinate Descent Method for Big Class-Imbalanced Data Classification Mingzhu Tang, 1,2 Chunhua Yang, 3 Kang Zhang, 1 and Qiyue Xie 1 1 School of Energy and Power Engineering, Changsha University of Science & Engineering, Changsha 410114, China 2 Hunan Province University Key Laboratory of Bridge Engineering, Changsha University of Science & Technology, Changsha, Hunan 410114, China 3 School of Information Science and Engineering, Central South University, Changsha 410083, China Correspondence should be addressed to Mingzhu Tang; [email protected] Received 14 April 2014; Accepted 31 May 2014; Published 3 July 2014 Academic Editor: Fuding Xie Copyright © 2014 Mingzhu Tang et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Cost-sensitive support vector machine is one of the most popular tools to deal with class-imbalanced problem such as fault diagnosis. However, such data appear with a huge number of examples as well as features. Aiming at class-imbalanced problem on big data, a cost-sensitive support vector machine using randomized dual coordinate descent method (CSVM-RDCD) is proposed in this paper. e solution of concerned subproblem at each iteration is derived in closed form and the computational cost is decreased through the accelerating strategy and cheap computation. e four constrained conditions of CSVM-RDCD are derived. Experimental results illustrate that the proposed method increases recognition rates of positive class and reduces average misclassification costs on real big class-imbalanced data. 1. Introduction e most popular strategy for the design of classification algorithms is to minimize the probability of error, assuming that all misclassifications have the same cost and classes of dataset are balanced [16]. e resulting decision rules are usually denoted as cost insensitive. However, in many important applications of machine learning, such as fault diagnosis [7] and fraud detection, certain types of error are much more costly than others. Other applications involve significantly class-imbalanced datasets, where examples from different classes appear with substantially different proba- bility. Cost-sensitive support vector machine (CSVM) [2] is one of the most popular tools to deal with class-imbalanced problem and unequal misclassification problem. However, in many applications, such data appear with a huge number of examples as well as features. In this work we consider the cost-sensitive support vector machine architecture [1]. Although CSVMs are based on a very solid learning-theoretic foundation and have been successfully applied to many classification problems, it is not well understood how to design big data learning of the CSVM algorithm. CSVM usually maps training vectors into a high dimensional space via a nonlinear function. Due to the high dimensionality of the weight vector, one solves the dual problem of CSVM by the kernel trick. In some applications, data appear in a rich dimensional feature space; the performances are similar with/without nonlinear mapping. If data are not mapped, we can oſten train much data set. Recently, many methods have been proposed for linear SVM in large-scale scenarios [4]. In all methods, dual coor- dinate descent methods for dual problem of CSVM are one of popular methods to deal with large-scale convex optimization problem. However, they do not focus on big data learning of CSVM. We focus on big data class-imbalanced learning by CSVM. is paper is organized as follows. In Section 2 basic theory of cost-sensitive support vector machine is described. In Section 3 we derive our proposed algorithm. Section 4 Hindawi Publishing Corporation Abstract and Applied Analysis Volume 2014, Article ID 416591, 9 pages http://dx.doi.org/10.1155/2014/416591

Research Article Cost-Sensitive Support Vector Machine Using …downloads.hindawi.com/journals/aaa/2014/416591.pdf · 2019-07-31 · Research Article Cost-Sensitive Support Vector

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Cost-Sensitive Support Vector Machine Using …downloads.hindawi.com/journals/aaa/2014/416591.pdf · 2019-07-31 · Research Article Cost-Sensitive Support Vector

Research ArticleCost-Sensitive Support Vector Machine Using RandomizedDual Coordinate Descent Method for Big Class-ImbalancedData Classification

Mingzhu Tang12 Chunhua Yang3 Kang Zhang1 and Qiyue Xie1

1 School of Energy and Power Engineering Changsha University of Science amp Engineering Changsha 410114 China2Hunan Province University Key Laboratory of Bridge Engineering Changsha University of Science amp Technology ChangshaHunan 410114 China

3 School of Information Science and Engineering Central South University Changsha 410083 China

Correspondence should be addressed to Mingzhu Tang tmzhu163com

Received 14 April 2014 Accepted 31 May 2014 Published 3 July 2014

Academic Editor Fuding Xie

Copyright copy 2014 Mingzhu Tang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Cost-sensitive support vector machine is one of the most popular tools to deal with class-imbalanced problem such as faultdiagnosis However such data appear with a huge number of examples as well as features Aiming at class-imbalanced problem onbig data a cost-sensitive support vector machine using randomized dual coordinate descent method (CSVM-RDCD) is proposedin this paper The solution of concerned subproblem at each iteration is derived in closed form and the computational costis decreased through the accelerating strategy and cheap computation The four constrained conditions of CSVM-RDCD arederived Experimental results illustrate that the proposed method increases recognition rates of positive class and reduces averagemisclassification costs on real big class-imbalanced data

1 Introduction

The most popular strategy for the design of classificationalgorithms is to minimize the probability of error assumingthat all misclassifications have the same cost and classesof dataset are balanced [1ndash6] The resulting decision rulesare usually denoted as cost insensitive However in manyimportant applications of machine learning such as faultdiagnosis [7] and fraud detection certain types of error aremuch more costly than others Other applications involvesignificantly class-imbalanced datasets where examples fromdifferent classes appear with substantially different proba-bility Cost-sensitive support vector machine (CSVM) [2] isone of the most popular tools to deal with class-imbalancedproblem and unequal misclassification problem However inmany applications such data appear with a huge number ofexamples as well as features

In this work we consider the cost-sensitive support vectormachine architecture [1] Although CSVMs are based ona very solid learning-theoretic foundation and have been

successfully applied to many classification problems it isnot well understood how to design big data learning ofthe CSVM algorithm CSVM usually maps training vectorsinto a high dimensional space via a nonlinear functionDue to the high dimensionality of the weight vector onesolves the dual problem of CSVM by the kernel trick Insome applications data appear in a rich dimensional featurespace the performances are similar withwithout nonlinearmapping If data are not mapped we can often train muchdata set

Recently many methods have been proposed for linearSVM in large-scale scenarios [4] In all methods dual coor-dinate descent methods for dual problem of CSVM are one ofpopularmethods to dealwith large-scale convex optimizationproblem However they do not focus on big data learning ofCSVM We focus on big data class-imbalanced learning byCSVM

This paper is organized as follows In Section 2 basictheory of cost-sensitive support vector machine is describedIn Section 3 we derive our proposed algorithm Section 4

Hindawi Publishing CorporationAbstract and Applied AnalysisVolume 2014 Article ID 416591 9 pageshttpdxdoiorg1011552014416591

2 Abstract and Applied Analysis

discusses both speed-ups and four constrained conditionsof cost-sensitive support vector machine Implementationissues are investigated in Section 5 Experiments show effi-ciently our proposed method

2 Basic Theory of Cost-Sensitive SupportVector Machine

Cost-sensitive support vector machine such as 2C-SVM isproposed in [1] Consider examples set (x

119894 119910119894)119897

119894=1 x119894isin 119877119889

119910119894isin 119884 = +1 minus1 119868

+= 119894 119910

119894= +1 119868

minus= 119894 119910

119894= minus1 The

2C-SVM has primal optimization problem

min119908119887120585

1

2w2 + 119862120574sum

119894isin119868+

120585119894+ 119862 (1 minus 120574) sum

119894isin119868minus

120585119894

st 119910119894(w119879x119894+ 119887) ge 1 minus 120585

119894119894 = 1 2 119897

120585119894ge 0 119894 = 1 2 119897

119862 gt 0

(1)

Lagrange multipliers method can be used to solve the con-strained optimization problem In this method the Lagrangeequation is defined as follows

119871 (w 119887 120585)

=1

2w2 + 119862120574sum

119894isin119868+

120585119894+ 119862 (1 minus 120574) sum

119894isin119868minus

120585119894

+

119897

sum119894=1

120572119894[1 minus 120585

119894minus 119910119894(w119879x119894+ 119887)] +

119897

sum119894=1

120573119894[minus120585119894]

(2)

where the 120572119894 120573119894rsquos are called the Lagrange multipliers 119871rsquos

partial derivatives are set to be zero as follows

120597119871 (w 119887 120585)120597w

= 0120597119871 (w 119887 120585)

120597119887= 0

120597119871 (w 119887 120585)120597120585119894

= 0

(3)

And solve w 119887 120585 The formula (3) is extended to be

120597119871 (w 119887 120585)120597w

= w +119897

sum119894=1

120572119894[minus119910119894x119894] = 0

120597119871 (w 119887 120585)120597119887

=

119897

sum119894=1

120572119894(minus119910119894) = 0

120597119871 (w 119887 120585)120597120585119894

= 119862120574 minus 120572119894minus 120573119894= 0 119894 isin 119868

+

120597119871 (w 119887 120585)120597120585119894

= 119862 (1 minus 120574) minus 120572119894minus 120573119894= 0 119894 isin 119868

minus

(4)

Equation (4) can be reformatted as

w =119897

sum119894=1

120572119894119910119894x119894 (5)

119897

sum119894=1

120572119894119910119894= 0 (6)

119862120574 minus 120572119894minus 120573119894= 0 119894 isin 119868

+

119862 (1 minus 120574) minus 120572119894minus 120573119894= 0 119894 isin 119868

minus

(7)

By incorporating (5) to (7) Lagrange equation (2) isrewritten as

119871 (119908 119887 120585)

=1

2

1003817100381710038171003817100381710038171003817100381710038171003817

119899

sum119894=1

120572119894119910119894x119894

1003817100381710038171003817100381710038171003817100381710038171003817

2

+ 119862120574sum119894isin119868+

120585119894+ 119862 (1 minus 120574) sum

119894isin119868minus

120585119894

+

119897

sum119894=1

120572119894[

[

1 minus 120585119894minus 119910119894((

119899

sum119894=1

120572119894119910119894x119894)

119879

x119894+ 119887)]

]

+ sum119894isin119868+

(119862120574 minus 120572119894) [minus120585119894] + sum119894isin119868minus

(119862 (1 minus 120574) minus 120572119894) [minus120585119894]

(8)

Lagrange equation (8) is simplified as

119871 (119908 119887 120585)

= minus1

2

119897

sum119894=1

120572119894[

[

119910119894((

119897

sum119895=1

120572119895119910119895x119895)

119879

x119894)]

]

+

119897

sum119894=1

120572119894

(9)

Recall that the equation above is obtained by minimizing119871 with respect to w and 119887 Putting this together with theconstraints 120572

119894ge 0 and the constraints (5) to (7) the following

dual optimization problem of 2C-SVM is obtained as

min120572

119891 (120572) =1

2

119897

sum119894=1

119897

sum119895=1

120572119894120572119895119910119894119910119895x119879119894x119895minus

119897

sum119894=1

120572119894

st 0 le 120572119894le 119862120574 119894 isin 119868

+

0 le 120572119894le 119862 (1 minus 120574) 119894 isin 119868

minus

119897

sum119894=1

120572119894119910119894= 0

(10)

where120572119894 119894 = 1 119897 are Lagrangemultipliers and119862120574119862(1minus120574)

119862 gt 0 120574 isin [0 1] are misclassification cost parameters Thematrixes 119876 and 120572 are defined respectively as follows

119876 = [119876119894119895]119897times119897 where 119876

119894119895= 120572119894120572119895119910119894119910119895x119879119894x119895

120572 = [1205721 120572

119894 120572

119895 120572

119897]

(11)

Abstract and Applied Analysis 3

3 Cost-Sensitive Support VectorMachine Using Randomized DualCoordinate Descent Method

In this section randomized dual coordinate descent methodis used to solve 2C-SVM that is one version of cost-sensitivesupport vectormachineTheoptimization process starts froman initial point 1205720 isin R119897 and generates a sequence of vectors120572119896 119896 = 0 infin The process from 120572119896 to 120572119896+1 is called

an outer iteration In each outer iteration 119897 is called inneriteration so that 120572

1 1205722 120572

119897are sequentially updated Each

outer iteration thus generates vectors120572119896119894 isin R119897 119894 = 1 2 119897+1 such that

1205721198961= 120572119896 119894 = 1

120572119896119894= [120572119896+1

1 120572

119896+1

119894minus1 120572119896

119894 120572

119896

119897] forall119894 = 2 119897

120572119896119897= 120572119896 119894 = 119897 + 1

(12)

For updating 120572119896119894 to 120572119896119894+1 the following one-variablesubproblem is solved as

min119889

119891 (120572119896119894+ 119889119890119894) (13)

where 119890119894= [0 1 0]

119879 This is general process ofone-variable coordinate descent method However the sumconstrained condition of the optimization problem (13) isdenoted as follows

119899

sum119894=1

120572119894119910119894= 0 (14)

where 120572119894is exactly determined by the other 120572

119894s and if we

were to hold 120572119894119899

119894=1 120572119894fixed then we cannot make any

change 120572119894without violating the constraint condition (14) in

the optimization problemThus if we want to update some subject of the 120572

119894s we

must update at least two of them simultaneously in orderto keep satisfying the constraints We currently have somesetting of the 120572

119894s that satisfy the constraint conditions (14)

and suppose we have decided to hold 120572119894119899

119894=1 120572119894 120572119895fixed and

reoptimize the dual problem of CSVM (10) with respect to 120572119894

120572119895(subject to the constraints) Equation (6) is reformatted as

120572119894119910119894+ 120572119895119910119895=

119897

sum119898=1

120572119898119910119898minus 120572119894119910119894minus 120572119895119910119895 (15)

Since the right hand side is fixed we can just let it bedenoted by some constant 120577

120577 =

119897

sum119898=1

120572119898119910119898minus 120572119894119910119894minus 120572119895119910119895 (16)

We can form the updated coordinates before and afterwith respect to 120572

119894 120572119895

120572119896

119894119910119894+ 120572119896

119895119910119895= 120572119896+1

119894119910119894+ 120572119896+1

119895119910119895= 120577 (17)

120572119896

119894119910119894 120572119896+1119894119910119894are updated coordinates before and after with

respect to 120572119896119894 Consider the following dual two-variable

subproblem of 2C-SVM

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895)

= 1198761198941198941198892

119894+ 2(120572

119896119894119876)119894minus 1 119889

119894

+ 1198761198951198951198892

119895+ 2(120572

119896119895119876)119895minus 1 119889

119895+ 119888

(18)

where 119888 is constant 119876119894= [119876119894119904 119876

119894119904minus1 119876119894119904 119876

119894119897] and

(120572119896119894119876)119894=

119897

sum119904=1

119876119894119904120572119896119904

119894 (19)

From (18) we obtain

1198911015840(120572) = 120572

119879119876 minus 1 (20)

120597119891 (120572119896119894)

120597120572119894

= 2(120572119896119894119876)119894minus 1 (21)

where 120597119891(120572119896119894)120597120572119894is the 119894th component of the gradient1198911015840(120572)

By incorporating (21) (18) is rewritten as

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895)

= 1198761198941198941198892

119894+120597119891 (120572119896119894)

120597120572119894

119889119894+ 1198761198951198951198892

119895+120597119891 (120572119896119895)

120597120572119895

119889119895+ 119888

(22)

From (17) we have 119889119895= minus119889

119894119910119894119910119895 Equation (22) is

reformed as follows

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895)

= 1198761198941198941198892

119894+120597119891 (120572119896119894)

120597120572119894

119889119894+ 1198761198951198951198892

119894minus120597119891 (120572119896119895)

120597120572119895

119889119894119910119894119910119895+ 119888

= (119876119894119894+ 119876119895119895) 1198892

119894+ (

120597119891 (120572119896119894)

120597120572119894

minus120597119891 (120572119896119895)

120597120572119895

119910119894119910119895)119889119894+ 119888

(23)

Treating (119876119894119894+119876119895119895) ((120597119891(120572119896119894)120597120572

119894) minus (120597119891(120572

119896119895)120597120572119895)119910119894119910119895)

119888 as constants we should be able to verify that this is justsome quadratic function in 119889

119894 We can easily maximize

this quadratic function by setting its derivative to zero andsolving the optimization problemThe following two-variableoptimization problem is obtained as

min119889119894

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895) (24)

4 Abstract and Applied Analysis

The closed form is derived as follows

119889119894=(120597119891 (120572

119896119894) 120597120572119894) minus (120597119891 (120572

119896119895) 120597120572119895) (119910119894119910119895)

2 (119876119894119894+ 119876119895119895)

(25)

We now consider the box constraints (10) of the two-variable optimization problem The box constraints as 0 le120572119894le 119862120574 119894 isin 119868

+ 0 le 120572

119894le 119862(1 minus 120574) 119894 isin 119868

minusare classified

to four boxes constraints according to the labels of examplesfrom some two coordinates

Firstly suppose that the labels of examples are as

119910119894= +1 119910

119895= minus1 (26)

The sum constraints of Lagrange multipliers according tothese labels are as

120572119894119910119894+ 120572119895119910119895= 120577 (27)

Another expressing of sum constraints of Lagrange mul-tipliers is as

120572119894minus 120572119895= 120577

or 120572119894= 120572119895+ 120577

(28)

The box constraints of Lagrangemultipliers are defined as

0 le 120572119894le 120574119862

0 le 120572119895le (1 minus 120574)119862

(29)

We obtain the new expressing of box constraints ofLagrange multipliers as the following

0 le 120577 le (1 minus 120574)119862

0 le 120572119894le 120572119894minus 120572119895minus (1 minus 120574)119862

or minus (1 minus 120574)119862 le 120577 le 0

120572119894minus 120572119895le 120572119894le 120574119862

(30)

Thus we obtain stricter box constraints of Lagrangemultipliers 120572

119894 120572119895according to 119910

119894= +1 119910

119895= minus1 as the

following

1198801= max (0 120572

119894minus 120572119895)

1198811= min (120572

119894minus 120572119895minus (1 minus 120574)119862 120574119862)

(31)

Secondly suppose that the labels of examples are as

119910119894= +1 119910

119895= +1 (32)

Similarly similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198802= max (0 120572

119894+ 120572119895minus 120574119862)

1198812= min (120572

119894+ 120572119895 120574119862)

(33)

Thirdly suppose that the labels of examples are as

119910119894= minus1 119910

119895= minus1 (34)

Similarly the similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198803= max (0 minus120572

119894minus 120572119895minus (1 minus 120574)119862)

1198813= min (minus120572

119894minus 120572119895 (1 minus 120574)119862)

(35)

Finally suppose that the labels of examples are as

119910119894= minus1 119910

119895= +1 (36)

Similarly the similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198804= max (0 minus120572

119894minus 120572119895)

1198814= min (120574119862 minus 120572

119894minus 120572119895 (1 minus 120574) 119862)

(37)

For the simplification set

120572temp119894

= 120572119896119894

119894+ 119889119894 (38)

120572119896119894+1

119894is defined as the temp solution which would be edited

to satisfy

120572119896119894+1

119894=

119881119894 if 120572temp

119894gt 119881119894 119894 = 1 2 3 4

120572temp119894 if 119880

119894lt 120572

temp119894

lt 119881119894 119894 = 1 2 3 4

119880119894 if 120572temp

119894lt 119880119894 119894 = 1 2 3 4

(39)

From linear constraints with respect to 120572119896119894+1119894

in (17) thevalue of 120572119896119895+1

119894is obtained as

120572119896119895+1

119894= 119910119895(119910119894120572119896119894

119894+ 119910119895120572119896119895

119894minus 119910119894120572119896119894+1

119894) (40)

4 The Modified Proposed Method

41 Two Speeding-Up Strategies To avoid duplicate andinvalid iterative process calculations the two given condi-tions are not updated If one of two conditions ismet then thealgorithm skips this iteration that can significantly reduce thecomputational amount and accelerate the convergence speed

Condition 1 If 120572119896119894119894= 120572119896119895

119894= 0 or 120572119896119894

119894= 120572119896119895

119894= 119862 then the

constrained conditions are not updated

Abstract and Applied Analysis 5

Due to the box constraints of (10) there will appear a lotof boundary points of coordinates (120572119896119894

119894= 119862 or 120572119896119894

119894= 0) in

computing process If there are two coordinates (120572119896119894119894and 120572119896119894119894)

as the value of 0 or 119862 in an iteration process the analyticalsolution of the two subvariables updates coordinates withoutcalculating The reason is that the formula (17) guarantees120572119896119894

119894+ 120572119896119895

119894= 0 or 120572119896119894

119894+ 120572119896119895

119894= 2119862 while double restricted

box constrained optimization 0 le 120572119896119894119894le 119862 if the result is 0 or

119862 Constrained conditions will be edited ultimately as 0 or 119862The constrained conditions are not updated

Condition 2 If projected gradient is 0 the constrainedconditions are not updated

42 Dual Problem of Cost-Sensitive Support Vector MachineUsing Randomized Dual Coordinate Descent Method and ItsComplexity Analysis From the above algorithmderivation ofview solving of CSVM seems to have been successful but thecomputational complexity of the solving process is also largerAssume that the average value of nonzero feature of eachsample is 119899 Firstly the computational complexity of the innerproduct matrix 119876

119894119894= 119910119894119909119879

119894119909119894119910119894 119894 = 1 2 119897 is 119874(119897119899) but the

process can be operated in advance and stored into memorySecondly the calculation of (120572119896119894119876)

119894= sum119897

119904=1119876119894119904120572119896119904

119894takes the

computational complexity 119874(119897119899) The amount of calculationis very great when the data size is large However there is alinear relationship model CSVM

119908 =

119897

sum119904=1

120572119896119904

119894119909119904119910119904 (41)

Thus (120572119896119894119876)119894is further simplified as follows

(120572119896119894119876)119894=

119897

sum119904=1

119876119894119904120572119896119904

119894=

119897

sum119904=1

(120572119896119904

119894119910119904119909119879

119904) 119909119904119910119904= 119908119879119909119894119910119894 (42)

Solving the corresponding formula (23) can be simplifiedas follows

119889119894=2(120572119896119894119876)119894minus 1 minus (2(120572

119896119895119876)119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

=2119908119879119909119894119910119894minus 1 minus (2119908

119879119909119895119910119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

(43)

As can be seen computing119889119894with the complexity of119874(119897119899)

becomes computing 119908119879119909119894119910119894with the complexity of 119874(119899)

Thus the calculation times reduce 119897 minus 1 However

119908 =

119897

sum119904=1

120572119896119904

119894119909119904119910119904 (44)

where 119908 is the updated still computational complexity119874(119897119899)The amount of calculation can be reduced significantly when119908 is updated by changing 120572119896119904

119894 Let 120572119896119894

119894 120572119896119895119894

be the values of

the current selection 120572119896119894+1119894

120572119896119895+1119894

are updated values whichcan be updated via a simple way

119908 larr997888 119908 + (120572119896119894

119894minus 120572119896119894+1

119894) 119909119894119910119894+ (120572119896119895

119894minus 120572119896119895+1

119894) 119909119895119910119895 (45)

Its computational complexity only is 119874(119899) So whethercalculating 119889

119894or updating 119908 coordinate gradient compu-

tation complexity is 119874(119899) which is one of the coordinategradient method rapid convergence speed reasons

When assigned an initial value and constraints based on

119899

sum119894=1

120572119894119910119894= 0

0 le 120572119894le 120574119862

0 le 120572119895le (1 minus 120574)119862

(46)

set the initial point

1205720= [ 120572

119894 120572

119895 ]

119879

1times119897 (47)

where 120572119894= 1119905 120572

119895= (1(119897 minus 119905))119897 119905 are the total number

of samples and the number of positive samples respectivelyThus the weight vector 119908 of the original problem is obtainedby optimizing Lagrangian multipliers 120572 of the dual problem

43 Description of Cost-Sensitive Support Vector MachineUsing Randomized Dual Coordinate Descent Method Accel-erated conditions are judged by Section 41 One chooses thecoordinate optimized number 119894 119895 We use formula (43) tocalculate119889

119894 formulas (39) and (40) to update120572119896119894

119894120572119896119895119894 and the

formula (45) to update 119908 respectively We can see that inneriteration takes 119874(119899) effort The computer memory is mainlyused to store samples information 119909

1 1199092 119909

119897and each

sample point and their inner products 119876119894119894= 119910119894119909119879

119894119909119894119910119894 Cost-

sensitive support vector machine using dual randomizedcoordinate gradient descent algorithm is described as follows

Algorithm 3 Cost-sensitive support vector machine usingrandomized dual coordinate descent algorithm (CSVM-RDCD)

Input sample information (119909119894 119910119894)119897

119894=1 1205720

Output 120572119879119908

Initialize 1205720 and the corresponding 119908 = sum119897119894=1120572119894119909119894119910119894

For 119896 = 1 2 119879 do

1205721198961larr997888 120572

119896 (48)

Step 1 Randomly choose 119894 119895 isin 1 2 119897 119894 = 119895 and thecorresponding 119910

119894 119910119895

6 Abstract and Applied Analysis

Table 1 Specification of the benchmark datasets Number of ex is the number of example data points Number of feat is the number offeatures Ratio is the class imbalance ratio Target specifies the target or positive class Number is the number of the datasets

Dataset Number of ex Number of feat Ratio Positive class NumberKDD99 (intrusion detection) 5209460 42 1 4 Normal 1Web span 350000 254 1 2 minus1 2MNIST 70000 780 1 10 5 3Covertype 581012 54 1 211 CottonwoodWillow(4) 4SIAM1 28596 30438 1 2000 1 6 7 11 5SIAM11 28596 30438 1 716 11 12 6

Step 2

If (119910119894= +1 119910

119895= minus1)

1198801= max (0 120572

119894minus 120572119895)

1198811= min (120572

119894minus 120572119895minus (1 minus 120574)119862 120574119862)

if (119910119894= +1 119910

119895= +1)

1198802= max (0 120572

119894+ 120572119895minus 120574119862)

1198812= min (120572

119894+ 120572119895 120574119862)

if (119910119894= minus1 119910

119895= minus1)

1198803= max (0 minus120572

119894minus 120572119895minus (1 minus 120574)119862)

1198813= min (minus120572

119894minus 120572119895 (1 minus 120574) 119862)

if (119910119894= minus1 119910

119895= +1)

1198804= max (0 minus120572

119894minus 120572119895)

1198814= min (120574119862 minus 120572

119894minus 120572119895 (1 minus 120574) 119862)

(49)

Step 3

120572temp119894

= 120572119896119894

119894+2119908119879119909119894119910119894minus 1 minus (2119908

119879119909119895119910119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

(50)

Step 4

if (119910119894= 1 119910

119895= minus1)119898 = 1

if (119910119894= 1 119910

119895= 1)119898 = 2

if (119910119894= minus1 119910

119895= minus1)119898 = 3

119891 (119910119894= minus1 119910

119895= +1)119898 = 4

120572119896119894+1

119894=

119881119898 if 120572temp

119894gt 119881119898

120572temp119894 if 119880

119898lt 120572

temp119894

lt 119881119898

119880119898 if 120572temp

119894lt 119880119898

(51)

Step 5

120572119896119895+1

119894= 119910119895119910119894120572119896119894

119894+ 119910119895120572119896119895

119895minus 119910119894120572119896119894+1

119894 (52)

Step 6

119908 larr997888 119908 + (120572119896119894

119894minus 120572119896119894+1

119894) 119909119894119910119894+ (120572119896119895

119894minus 120572119896119895+1

119894) 119909119895119910119895 (53)

Until A stopping condition is satisfied

120572119896+1larr997888 120572

119896 (54)

End for

5 Experiments and Analysis

51 Experiment on Big Class-Imbalanced Benchmark DatasetsClassification Problem In this section we analyze the perfor-mance of the proposed cost-sensitive support vectormachineusing randomized dual coordinate descent method (CSVM-RDCD) We compare our implementation with the state-of-the-art cost-sensitive support vector Three implementationsof related cost-sensitive SVMs are compared We proposedthe cost-sensitive SVM using randomized dual coordinatedescent method (CSVM-RDCD) by modifying the LibSVM[8 9] source code Eitrich and Lang [10] proposed parallelcost-sensitive support vector machine (PCSVM) [6] pro-posed cost-sensitive support vector machines (CSSVM)

Table 1 lists the statistics of data sets KDD99 WebSpam Covertype MNIST SIAM1 and SIAM11 are obtainedfrom the following data website KDD99 is at httpkddicsuciedudatabaseskddcup99kddcup99html Web Spam isat httpwwwccgatecheduprojectsdoiWebbSpamCorpushtml Covertype is at httparchiveicsuciedumldatasetsCovertype MNIST is at httpyannlecuncomexdbmnistSIAM is at httpsc3nasagovdashlinkresources138

To evaluate the performance of CSVM-RDCD methodwe use a stratified selection to split each dataset to 910training and 110 testing We briefly describe each set belowFor each dataset we choose the class with the higher cost orfewer data points as the target or positive class All multiclassdatasets were converted to binary data sets In particular thebinary datasets SIAM1 and SIAM2 are datasets which havebeen constructed from the same multiclass dataset but withdifferent target class and different imbalance ratios

Abstract and Applied Analysis 7

Table 2 Cost-sensitive parameters of three algorithms using CSVMon 6 datasets

Dataset 119862120574 119862(1 minus 120574)

KDD99 (intrusion detection) 4 1Web span 2 1MNIST 10 1Covertype 211 1SIAM1 2000 1SIAM11 716 1

Evaluation of the performance of the three algorithmsusing CSVM was 10-fold cross-validation tested averagemisclassification cost the training time and the recognitionrate of the positive class (ie the recognition rate of theminority class)

Average misclassification cost (AMC) represents 10-foldcross-validation averagemisclassification cost on test datasetsfor related CSVMs described as follows

AMC =119891119901 times 119862120574 + 119891119899 times 119862 (1 minus 120574)

119873 (55)

where 119891119901 and 119891119899 represent the number of the positiveexamples misclassified as the negative examples in the testdataset and the number of the negative examplesmisclassifiedas the positive examples in the test data set respectively119862120574 and 119862(1 minus 120574) denote the cost of the positive examplesmisclassified as the negative examples in the test data setand the cost of the negative examples misclassified as thepositive examples in the test data set respectively119873 denotesthe number of test examples

The recognition rate of the positive class is the numberof classified positive classes and the number of the positiveclasses on testing dataset

Training time in seconds is used to evaluate the conver-gence speeding of three algorithms using CSVM on the samecomputer

Cost-sensitive parameters of three algorithms usingCSVM are specified as the following Table 2 Cost-sensitiveparameters 119862120574 and 119862(1 minus 120574) are valued according to the classratio of datasets namely the class ratio of minority class andmajority class

Three datasets with relative class imbalance are exam-ined Namely KDD99 (intrusion detection) Web span andMNIST datasets are considered Three datasets with severeclass imbalance are examined Namely Covertype SIAM1and SIAM11 datasets are considered The average misclassi-fication cost comparison of three algorithms using CSVMis shown in Figure 1 for each of the datasets The CSVM-RDCD algorithm outperforms the PCSVM and CSVM on alldatasets

The recognition rate of positive class comparison of threealgorithms using CSVM is shown in Figure 2 for each ofthe datasets The CSVM-RDCD algorithm outperforms thePCSVM and CSSVM on all datasets surpasses the PCSVMon four datasets and ties with the PCSVM on two datasets

We examine large datasets with relative imbalance ratiosand severe imbalance ratios to evaluate the convergence

1 2 3 4 5 6003

0035

004

0045

005

0055

006

0065

007

0075

AM

C

AMC

CSVM-RDCDPCSVMCSSVM

Datasets

Figure 1 Average misclassification cost comparison of three algo-rithms using CSVM on 6 datasets

1 2 3 4 5 60972

0974

0976

0978

098

0982

0984

0986

0988

099

0992

Datasets

Reco

gniti

on ra

te o

f pos

itive

clas

s

Recognition rate of positive class

CSVM-RDCDPCSVMCSSVM

Figure 2 The positive class recognition rate comparison of threealgorithms using CSVM

speed of CSVM-RDCD algorithm The training time com-parison of three algorithms using CSVM is shown in Figure 3for each of the datasetsTheCSVM-RDCD algorithm outper-forms the PCSVM and CSSVM on all datasets

52 Experiment on Real-World Big Class-Imbalanced DatasetClassification Problems In order to verify the effectivenessof the proposed algorithm CSVM-RDCD on real-worldbig class-imbalanced data classification problems it was

8 Abstract and Applied Analysis

1 2 3 4 5 60

5

10

15

20

25

30

Trai

ning

tim

e (s)

Training time

Datasets

CSVM-RDCDPCSVMCSSVM

Figure 3 Training time (s) comparison of three algorithms usingCSSVM

Table 3 Specification of the benchmark datasets and cost-sensitiveparameters on the dataset

Dataset Fault diagnosis of wind turbineNumber of ex 20000Number of feat 56Ratio 190Positive class Local crack fault119862120574 190119862(1 minus 120574) 1

evaluated using the real vibration data measured in the windturbine The experimental data were from the SKF Wind-Con software and collected from a wind turbine gearboxtype TF138-A [11] The vibration signals were continuouslyacquired by an accelerometer mounted on the outer case ofthe gearbox

All parameter settings for the dataset are listed in Table 3The statistical results of the big class-imbalanced data prob-lems that measure the quality of results (average misclassi-fication cost recognition rate of positive class and trainingtime) are listed in Table 4 From Table 4 it can be concludedthat CSVM-RDCD is able to consistently achieve superiorperformance in the big class-imbalanced data classificationproblems

Experimental results show that it is applicable to solvecost-sensitive SVM dual problem using randomized dualcoordinate descent method on the large-scale experimentaldata sets The proposed method can achieve superior perfor-mance in the average misclassification cost recognition rateof positive class and training time Large-scale experimentaldata sets show that cost-sensitive support vector machinesusing randomized dual coordinate descent method run

Table 4 Classification performance of three algorithms usingCSVM on the dataset

Name ofalgorithms AMC Recognition of

positive class Training time

CSVM-RDCD 001825 985 543 sPCSVM 00334 976 615 sCSSVM 00379 972 734 s

more efficiently than both PCSVM and CSSVM especiallyrandomized dual coordinate descent algorithmhas advantageof training time on large-scale data sets CSSVM needs tobuild complex whole gradient and kernel matrix 119876 andneeds to select the set of complex work in solving process ofdecomposition algorithm Decomposition algorithm updatesfull uniform gradient information as a whole the computa-tional complexity 119874(119897119899) for the full gradient update PCSVMalso has similar computational complexity Randomized dualcoordinate gradient method updates linearly the coordinatesits computational complexity as 119874(119899) which increases con-siderably the convergence speed of the proposed method

6 Conclusions

Randomized dual coordinate descentmethod (RDCD) is theoptimization algorithm to update the global solution which isobtained by solving an analytical solution of the suboptimalproblemThe RDCDmethod has the rapid convergence ratewhich is mainly due to the following (1) the subproblemhas formal analytical solution which is solved in solvingprocess without complex numerical optimization (2) thenext component of RDCD method in solving process isupdated on the basis of a previous component comparedwith the full gradient information updated CSSVM methodas a whole the objective function of RDCD method candecline faster (3) the single coordinate gradient calculationof RDCDmethod is simpler and easier than the full gradientcalculation

Randomized dual coordinate descent method is appliedto cost-sensitive support vector machine which expandedthe scope of application of the randomized dual coordinatedescent method For large-scale class-imbalanced problema cost-sensitive SVM using randomized dual coordinatedescent method is proposed Experimental results and anal-ysis show the effectiveness and feasibility of the proposedmethod

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the National ScienceFund for Distinguished Young Scholars of China (no61025015) the National Natural Science Foundation of China

Abstract and Applied Analysis 9

(nos 51305046 and 61304019) the Research Foundation ofEducation Bureau of Hunan Province China (no 12A007)Open Fund of Hunan Province University Key Laboratoryof Bridge Engineering (Changsha University of Scienceand Technology) the Key Laboratory of Renewable EnergyElectric-Technology of Hunan Province (Changsha Univer-sity of Science and Technology) the Key Laboratory ofEfficient and Clean Energy Utilization College of HunanProvince and the Introduction of Talent Fund of ChangshaUniversity of Science and Technology

References

[1] M A Davenport ldquoThe 2nu-SVM a cost-sensitive extension ofthe nu-SVMrdquo Tech Rep TREE 0504 Department of Electricaland Computer Engineering Rice University 2005

[2] M Kim ldquoLarge margin cost-sensitive learning of conditionalrandom fieldsrdquo Pattern Recognition vol 43 no 10 pp 3683ndash3692 2010

[3] Y-J Park S-H Chun and B-C Kim ldquoCost-sensitive case-based reasoning using a genetic algorithm application tomedical diagnosisrdquoArtificial Intelligence inMedicine vol 51 no2 pp 133ndash145 2011

[4] J Kim K Choi G Kim and Y Suh ldquoClassification costan empirical comparison among traditional classifier Cost-Sensitive Classifier and MetaCostrdquo Expert Systems with Appli-cations vol 39 no 4 pp 4013ndash4019 2012

[5] C-Y Yang J-S Yang and J-J Wang ldquoMargin calibration inSVM class-imbalanced learningrdquo Neurocomputing vol 73 no1ndash3 pp 397ndash411 2009

[6] H Masnadi-Shirazi N Vasconcelos and A Iranmehr ldquoCost-sensitive supportvector machinesrdquo httparxivorgabs12120975

[7] Y Artan M A Haider D L Langer et al ldquoProstate cancerlocalization with multispectral MRI using cost-sensitive sup-port vector machines and conditional random fieldsrdquo IEEETransactions on Image Processing vol 19 no 9 pp 2444ndash24552010

[8] C-J Hsieh K-W Chang C-J Lin S S Keerthi and SSundararajan ldquoA dual coordinate descent method for large-scale linear SVMrdquo in Proceedings of the 25th InternationalConference on Machine Learning pp 408ndash415 July 2008

[9] R-E Fan K-W Chang C-J Hsieh X-R Wang and C-JLin ldquoLIBLINEAR a library for large linear classificationrdquo TheJournal of Machine Learning Research vol 9 pp 1871ndash18742008

[10] T Eitrich and B Lang ldquoParallel cost-sensitive support vectormachine software for classificationrdquo in Proceedings of the Work-shop from Computational Biophysics to Systems Biology NICSeries pp 141ndash144 John vonNeumann Institute forComputingJulich Germany 2006

[11] B Tang W Liu and T Song ldquoWind turbine fault diagnosisbased on Morlet wavelet transformation and Wigner-Villedistributionrdquo Renewable Energy vol 35 no 12 pp 2862ndash28662010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Cost-Sensitive Support Vector Machine Using …downloads.hindawi.com/journals/aaa/2014/416591.pdf · 2019-07-31 · Research Article Cost-Sensitive Support Vector

2 Abstract and Applied Analysis

discusses both speed-ups and four constrained conditionsof cost-sensitive support vector machine Implementationissues are investigated in Section 5 Experiments show effi-ciently our proposed method

2 Basic Theory of Cost-Sensitive SupportVector Machine

Cost-sensitive support vector machine such as 2C-SVM isproposed in [1] Consider examples set (x

119894 119910119894)119897

119894=1 x119894isin 119877119889

119910119894isin 119884 = +1 minus1 119868

+= 119894 119910

119894= +1 119868

minus= 119894 119910

119894= minus1 The

2C-SVM has primal optimization problem

min119908119887120585

1

2w2 + 119862120574sum

119894isin119868+

120585119894+ 119862 (1 minus 120574) sum

119894isin119868minus

120585119894

st 119910119894(w119879x119894+ 119887) ge 1 minus 120585

119894119894 = 1 2 119897

120585119894ge 0 119894 = 1 2 119897

119862 gt 0

(1)

Lagrange multipliers method can be used to solve the con-strained optimization problem In this method the Lagrangeequation is defined as follows

119871 (w 119887 120585)

=1

2w2 + 119862120574sum

119894isin119868+

120585119894+ 119862 (1 minus 120574) sum

119894isin119868minus

120585119894

+

119897

sum119894=1

120572119894[1 minus 120585

119894minus 119910119894(w119879x119894+ 119887)] +

119897

sum119894=1

120573119894[minus120585119894]

(2)

where the 120572119894 120573119894rsquos are called the Lagrange multipliers 119871rsquos

partial derivatives are set to be zero as follows

120597119871 (w 119887 120585)120597w

= 0120597119871 (w 119887 120585)

120597119887= 0

120597119871 (w 119887 120585)120597120585119894

= 0

(3)

And solve w 119887 120585 The formula (3) is extended to be

120597119871 (w 119887 120585)120597w

= w +119897

sum119894=1

120572119894[minus119910119894x119894] = 0

120597119871 (w 119887 120585)120597119887

=

119897

sum119894=1

120572119894(minus119910119894) = 0

120597119871 (w 119887 120585)120597120585119894

= 119862120574 minus 120572119894minus 120573119894= 0 119894 isin 119868

+

120597119871 (w 119887 120585)120597120585119894

= 119862 (1 minus 120574) minus 120572119894minus 120573119894= 0 119894 isin 119868

minus

(4)

Equation (4) can be reformatted as

w =119897

sum119894=1

120572119894119910119894x119894 (5)

119897

sum119894=1

120572119894119910119894= 0 (6)

119862120574 minus 120572119894minus 120573119894= 0 119894 isin 119868

+

119862 (1 minus 120574) minus 120572119894minus 120573119894= 0 119894 isin 119868

minus

(7)

By incorporating (5) to (7) Lagrange equation (2) isrewritten as

119871 (119908 119887 120585)

=1

2

1003817100381710038171003817100381710038171003817100381710038171003817

119899

sum119894=1

120572119894119910119894x119894

1003817100381710038171003817100381710038171003817100381710038171003817

2

+ 119862120574sum119894isin119868+

120585119894+ 119862 (1 minus 120574) sum

119894isin119868minus

120585119894

+

119897

sum119894=1

120572119894[

[

1 minus 120585119894minus 119910119894((

119899

sum119894=1

120572119894119910119894x119894)

119879

x119894+ 119887)]

]

+ sum119894isin119868+

(119862120574 minus 120572119894) [minus120585119894] + sum119894isin119868minus

(119862 (1 minus 120574) minus 120572119894) [minus120585119894]

(8)

Lagrange equation (8) is simplified as

119871 (119908 119887 120585)

= minus1

2

119897

sum119894=1

120572119894[

[

119910119894((

119897

sum119895=1

120572119895119910119895x119895)

119879

x119894)]

]

+

119897

sum119894=1

120572119894

(9)

Recall that the equation above is obtained by minimizing119871 with respect to w and 119887 Putting this together with theconstraints 120572

119894ge 0 and the constraints (5) to (7) the following

dual optimization problem of 2C-SVM is obtained as

min120572

119891 (120572) =1

2

119897

sum119894=1

119897

sum119895=1

120572119894120572119895119910119894119910119895x119879119894x119895minus

119897

sum119894=1

120572119894

st 0 le 120572119894le 119862120574 119894 isin 119868

+

0 le 120572119894le 119862 (1 minus 120574) 119894 isin 119868

minus

119897

sum119894=1

120572119894119910119894= 0

(10)

where120572119894 119894 = 1 119897 are Lagrangemultipliers and119862120574119862(1minus120574)

119862 gt 0 120574 isin [0 1] are misclassification cost parameters Thematrixes 119876 and 120572 are defined respectively as follows

119876 = [119876119894119895]119897times119897 where 119876

119894119895= 120572119894120572119895119910119894119910119895x119879119894x119895

120572 = [1205721 120572

119894 120572

119895 120572

119897]

(11)

Abstract and Applied Analysis 3

3 Cost-Sensitive Support VectorMachine Using Randomized DualCoordinate Descent Method

In this section randomized dual coordinate descent methodis used to solve 2C-SVM that is one version of cost-sensitivesupport vectormachineTheoptimization process starts froman initial point 1205720 isin R119897 and generates a sequence of vectors120572119896 119896 = 0 infin The process from 120572119896 to 120572119896+1 is called

an outer iteration In each outer iteration 119897 is called inneriteration so that 120572

1 1205722 120572

119897are sequentially updated Each

outer iteration thus generates vectors120572119896119894 isin R119897 119894 = 1 2 119897+1 such that

1205721198961= 120572119896 119894 = 1

120572119896119894= [120572119896+1

1 120572

119896+1

119894minus1 120572119896

119894 120572

119896

119897] forall119894 = 2 119897

120572119896119897= 120572119896 119894 = 119897 + 1

(12)

For updating 120572119896119894 to 120572119896119894+1 the following one-variablesubproblem is solved as

min119889

119891 (120572119896119894+ 119889119890119894) (13)

where 119890119894= [0 1 0]

119879 This is general process ofone-variable coordinate descent method However the sumconstrained condition of the optimization problem (13) isdenoted as follows

119899

sum119894=1

120572119894119910119894= 0 (14)

where 120572119894is exactly determined by the other 120572

119894s and if we

were to hold 120572119894119899

119894=1 120572119894fixed then we cannot make any

change 120572119894without violating the constraint condition (14) in

the optimization problemThus if we want to update some subject of the 120572

119894s we

must update at least two of them simultaneously in orderto keep satisfying the constraints We currently have somesetting of the 120572

119894s that satisfy the constraint conditions (14)

and suppose we have decided to hold 120572119894119899

119894=1 120572119894 120572119895fixed and

reoptimize the dual problem of CSVM (10) with respect to 120572119894

120572119895(subject to the constraints) Equation (6) is reformatted as

120572119894119910119894+ 120572119895119910119895=

119897

sum119898=1

120572119898119910119898minus 120572119894119910119894minus 120572119895119910119895 (15)

Since the right hand side is fixed we can just let it bedenoted by some constant 120577

120577 =

119897

sum119898=1

120572119898119910119898minus 120572119894119910119894minus 120572119895119910119895 (16)

We can form the updated coordinates before and afterwith respect to 120572

119894 120572119895

120572119896

119894119910119894+ 120572119896

119895119910119895= 120572119896+1

119894119910119894+ 120572119896+1

119895119910119895= 120577 (17)

120572119896

119894119910119894 120572119896+1119894119910119894are updated coordinates before and after with

respect to 120572119896119894 Consider the following dual two-variable

subproblem of 2C-SVM

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895)

= 1198761198941198941198892

119894+ 2(120572

119896119894119876)119894minus 1 119889

119894

+ 1198761198951198951198892

119895+ 2(120572

119896119895119876)119895minus 1 119889

119895+ 119888

(18)

where 119888 is constant 119876119894= [119876119894119904 119876

119894119904minus1 119876119894119904 119876

119894119897] and

(120572119896119894119876)119894=

119897

sum119904=1

119876119894119904120572119896119904

119894 (19)

From (18) we obtain

1198911015840(120572) = 120572

119879119876 minus 1 (20)

120597119891 (120572119896119894)

120597120572119894

= 2(120572119896119894119876)119894minus 1 (21)

where 120597119891(120572119896119894)120597120572119894is the 119894th component of the gradient1198911015840(120572)

By incorporating (21) (18) is rewritten as

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895)

= 1198761198941198941198892

119894+120597119891 (120572119896119894)

120597120572119894

119889119894+ 1198761198951198951198892

119895+120597119891 (120572119896119895)

120597120572119895

119889119895+ 119888

(22)

From (17) we have 119889119895= minus119889

119894119910119894119910119895 Equation (22) is

reformed as follows

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895)

= 1198761198941198941198892

119894+120597119891 (120572119896119894)

120597120572119894

119889119894+ 1198761198951198951198892

119894minus120597119891 (120572119896119895)

120597120572119895

119889119894119910119894119910119895+ 119888

= (119876119894119894+ 119876119895119895) 1198892

119894+ (

120597119891 (120572119896119894)

120597120572119894

minus120597119891 (120572119896119895)

120597120572119895

119910119894119910119895)119889119894+ 119888

(23)

Treating (119876119894119894+119876119895119895) ((120597119891(120572119896119894)120597120572

119894) minus (120597119891(120572

119896119895)120597120572119895)119910119894119910119895)

119888 as constants we should be able to verify that this is justsome quadratic function in 119889

119894 We can easily maximize

this quadratic function by setting its derivative to zero andsolving the optimization problemThe following two-variableoptimization problem is obtained as

min119889119894

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895) (24)

4 Abstract and Applied Analysis

The closed form is derived as follows

119889119894=(120597119891 (120572

119896119894) 120597120572119894) minus (120597119891 (120572

119896119895) 120597120572119895) (119910119894119910119895)

2 (119876119894119894+ 119876119895119895)

(25)

We now consider the box constraints (10) of the two-variable optimization problem The box constraints as 0 le120572119894le 119862120574 119894 isin 119868

+ 0 le 120572

119894le 119862(1 minus 120574) 119894 isin 119868

minusare classified

to four boxes constraints according to the labels of examplesfrom some two coordinates

Firstly suppose that the labels of examples are as

119910119894= +1 119910

119895= minus1 (26)

The sum constraints of Lagrange multipliers according tothese labels are as

120572119894119910119894+ 120572119895119910119895= 120577 (27)

Another expressing of sum constraints of Lagrange mul-tipliers is as

120572119894minus 120572119895= 120577

or 120572119894= 120572119895+ 120577

(28)

The box constraints of Lagrangemultipliers are defined as

0 le 120572119894le 120574119862

0 le 120572119895le (1 minus 120574)119862

(29)

We obtain the new expressing of box constraints ofLagrange multipliers as the following

0 le 120577 le (1 minus 120574)119862

0 le 120572119894le 120572119894minus 120572119895minus (1 minus 120574)119862

or minus (1 minus 120574)119862 le 120577 le 0

120572119894minus 120572119895le 120572119894le 120574119862

(30)

Thus we obtain stricter box constraints of Lagrangemultipliers 120572

119894 120572119895according to 119910

119894= +1 119910

119895= minus1 as the

following

1198801= max (0 120572

119894minus 120572119895)

1198811= min (120572

119894minus 120572119895minus (1 minus 120574)119862 120574119862)

(31)

Secondly suppose that the labels of examples are as

119910119894= +1 119910

119895= +1 (32)

Similarly similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198802= max (0 120572

119894+ 120572119895minus 120574119862)

1198812= min (120572

119894+ 120572119895 120574119862)

(33)

Thirdly suppose that the labels of examples are as

119910119894= minus1 119910

119895= minus1 (34)

Similarly the similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198803= max (0 minus120572

119894minus 120572119895minus (1 minus 120574)119862)

1198813= min (minus120572

119894minus 120572119895 (1 minus 120574)119862)

(35)

Finally suppose that the labels of examples are as

119910119894= minus1 119910

119895= +1 (36)

Similarly the similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198804= max (0 minus120572

119894minus 120572119895)

1198814= min (120574119862 minus 120572

119894minus 120572119895 (1 minus 120574) 119862)

(37)

For the simplification set

120572temp119894

= 120572119896119894

119894+ 119889119894 (38)

120572119896119894+1

119894is defined as the temp solution which would be edited

to satisfy

120572119896119894+1

119894=

119881119894 if 120572temp

119894gt 119881119894 119894 = 1 2 3 4

120572temp119894 if 119880

119894lt 120572

temp119894

lt 119881119894 119894 = 1 2 3 4

119880119894 if 120572temp

119894lt 119880119894 119894 = 1 2 3 4

(39)

From linear constraints with respect to 120572119896119894+1119894

in (17) thevalue of 120572119896119895+1

119894is obtained as

120572119896119895+1

119894= 119910119895(119910119894120572119896119894

119894+ 119910119895120572119896119895

119894minus 119910119894120572119896119894+1

119894) (40)

4 The Modified Proposed Method

41 Two Speeding-Up Strategies To avoid duplicate andinvalid iterative process calculations the two given condi-tions are not updated If one of two conditions ismet then thealgorithm skips this iteration that can significantly reduce thecomputational amount and accelerate the convergence speed

Condition 1 If 120572119896119894119894= 120572119896119895

119894= 0 or 120572119896119894

119894= 120572119896119895

119894= 119862 then the

constrained conditions are not updated

Abstract and Applied Analysis 5

Due to the box constraints of (10) there will appear a lotof boundary points of coordinates (120572119896119894

119894= 119862 or 120572119896119894

119894= 0) in

computing process If there are two coordinates (120572119896119894119894and 120572119896119894119894)

as the value of 0 or 119862 in an iteration process the analyticalsolution of the two subvariables updates coordinates withoutcalculating The reason is that the formula (17) guarantees120572119896119894

119894+ 120572119896119895

119894= 0 or 120572119896119894

119894+ 120572119896119895

119894= 2119862 while double restricted

box constrained optimization 0 le 120572119896119894119894le 119862 if the result is 0 or

119862 Constrained conditions will be edited ultimately as 0 or 119862The constrained conditions are not updated

Condition 2 If projected gradient is 0 the constrainedconditions are not updated

42 Dual Problem of Cost-Sensitive Support Vector MachineUsing Randomized Dual Coordinate Descent Method and ItsComplexity Analysis From the above algorithmderivation ofview solving of CSVM seems to have been successful but thecomputational complexity of the solving process is also largerAssume that the average value of nonzero feature of eachsample is 119899 Firstly the computational complexity of the innerproduct matrix 119876

119894119894= 119910119894119909119879

119894119909119894119910119894 119894 = 1 2 119897 is 119874(119897119899) but the

process can be operated in advance and stored into memorySecondly the calculation of (120572119896119894119876)

119894= sum119897

119904=1119876119894119904120572119896119904

119894takes the

computational complexity 119874(119897119899) The amount of calculationis very great when the data size is large However there is alinear relationship model CSVM

119908 =

119897

sum119904=1

120572119896119904

119894119909119904119910119904 (41)

Thus (120572119896119894119876)119894is further simplified as follows

(120572119896119894119876)119894=

119897

sum119904=1

119876119894119904120572119896119904

119894=

119897

sum119904=1

(120572119896119904

119894119910119904119909119879

119904) 119909119904119910119904= 119908119879119909119894119910119894 (42)

Solving the corresponding formula (23) can be simplifiedas follows

119889119894=2(120572119896119894119876)119894minus 1 minus (2(120572

119896119895119876)119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

=2119908119879119909119894119910119894minus 1 minus (2119908

119879119909119895119910119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

(43)

As can be seen computing119889119894with the complexity of119874(119897119899)

becomes computing 119908119879119909119894119910119894with the complexity of 119874(119899)

Thus the calculation times reduce 119897 minus 1 However

119908 =

119897

sum119904=1

120572119896119904

119894119909119904119910119904 (44)

where 119908 is the updated still computational complexity119874(119897119899)The amount of calculation can be reduced significantly when119908 is updated by changing 120572119896119904

119894 Let 120572119896119894

119894 120572119896119895119894

be the values of

the current selection 120572119896119894+1119894

120572119896119895+1119894

are updated values whichcan be updated via a simple way

119908 larr997888 119908 + (120572119896119894

119894minus 120572119896119894+1

119894) 119909119894119910119894+ (120572119896119895

119894minus 120572119896119895+1

119894) 119909119895119910119895 (45)

Its computational complexity only is 119874(119899) So whethercalculating 119889

119894or updating 119908 coordinate gradient compu-

tation complexity is 119874(119899) which is one of the coordinategradient method rapid convergence speed reasons

When assigned an initial value and constraints based on

119899

sum119894=1

120572119894119910119894= 0

0 le 120572119894le 120574119862

0 le 120572119895le (1 minus 120574)119862

(46)

set the initial point

1205720= [ 120572

119894 120572

119895 ]

119879

1times119897 (47)

where 120572119894= 1119905 120572

119895= (1(119897 minus 119905))119897 119905 are the total number

of samples and the number of positive samples respectivelyThus the weight vector 119908 of the original problem is obtainedby optimizing Lagrangian multipliers 120572 of the dual problem

43 Description of Cost-Sensitive Support Vector MachineUsing Randomized Dual Coordinate Descent Method Accel-erated conditions are judged by Section 41 One chooses thecoordinate optimized number 119894 119895 We use formula (43) tocalculate119889

119894 formulas (39) and (40) to update120572119896119894

119894120572119896119895119894 and the

formula (45) to update 119908 respectively We can see that inneriteration takes 119874(119899) effort The computer memory is mainlyused to store samples information 119909

1 1199092 119909

119897and each

sample point and their inner products 119876119894119894= 119910119894119909119879

119894119909119894119910119894 Cost-

sensitive support vector machine using dual randomizedcoordinate gradient descent algorithm is described as follows

Algorithm 3 Cost-sensitive support vector machine usingrandomized dual coordinate descent algorithm (CSVM-RDCD)

Input sample information (119909119894 119910119894)119897

119894=1 1205720

Output 120572119879119908

Initialize 1205720 and the corresponding 119908 = sum119897119894=1120572119894119909119894119910119894

For 119896 = 1 2 119879 do

1205721198961larr997888 120572

119896 (48)

Step 1 Randomly choose 119894 119895 isin 1 2 119897 119894 = 119895 and thecorresponding 119910

119894 119910119895

6 Abstract and Applied Analysis

Table 1 Specification of the benchmark datasets Number of ex is the number of example data points Number of feat is the number offeatures Ratio is the class imbalance ratio Target specifies the target or positive class Number is the number of the datasets

Dataset Number of ex Number of feat Ratio Positive class NumberKDD99 (intrusion detection) 5209460 42 1 4 Normal 1Web span 350000 254 1 2 minus1 2MNIST 70000 780 1 10 5 3Covertype 581012 54 1 211 CottonwoodWillow(4) 4SIAM1 28596 30438 1 2000 1 6 7 11 5SIAM11 28596 30438 1 716 11 12 6

Step 2

If (119910119894= +1 119910

119895= minus1)

1198801= max (0 120572

119894minus 120572119895)

1198811= min (120572

119894minus 120572119895minus (1 minus 120574)119862 120574119862)

if (119910119894= +1 119910

119895= +1)

1198802= max (0 120572

119894+ 120572119895minus 120574119862)

1198812= min (120572

119894+ 120572119895 120574119862)

if (119910119894= minus1 119910

119895= minus1)

1198803= max (0 minus120572

119894minus 120572119895minus (1 minus 120574)119862)

1198813= min (minus120572

119894minus 120572119895 (1 minus 120574) 119862)

if (119910119894= minus1 119910

119895= +1)

1198804= max (0 minus120572

119894minus 120572119895)

1198814= min (120574119862 minus 120572

119894minus 120572119895 (1 minus 120574) 119862)

(49)

Step 3

120572temp119894

= 120572119896119894

119894+2119908119879119909119894119910119894minus 1 minus (2119908

119879119909119895119910119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

(50)

Step 4

if (119910119894= 1 119910

119895= minus1)119898 = 1

if (119910119894= 1 119910

119895= 1)119898 = 2

if (119910119894= minus1 119910

119895= minus1)119898 = 3

119891 (119910119894= minus1 119910

119895= +1)119898 = 4

120572119896119894+1

119894=

119881119898 if 120572temp

119894gt 119881119898

120572temp119894 if 119880

119898lt 120572

temp119894

lt 119881119898

119880119898 if 120572temp

119894lt 119880119898

(51)

Step 5

120572119896119895+1

119894= 119910119895119910119894120572119896119894

119894+ 119910119895120572119896119895

119895minus 119910119894120572119896119894+1

119894 (52)

Step 6

119908 larr997888 119908 + (120572119896119894

119894minus 120572119896119894+1

119894) 119909119894119910119894+ (120572119896119895

119894minus 120572119896119895+1

119894) 119909119895119910119895 (53)

Until A stopping condition is satisfied

120572119896+1larr997888 120572

119896 (54)

End for

5 Experiments and Analysis

51 Experiment on Big Class-Imbalanced Benchmark DatasetsClassification Problem In this section we analyze the perfor-mance of the proposed cost-sensitive support vectormachineusing randomized dual coordinate descent method (CSVM-RDCD) We compare our implementation with the state-of-the-art cost-sensitive support vector Three implementationsof related cost-sensitive SVMs are compared We proposedthe cost-sensitive SVM using randomized dual coordinatedescent method (CSVM-RDCD) by modifying the LibSVM[8 9] source code Eitrich and Lang [10] proposed parallelcost-sensitive support vector machine (PCSVM) [6] pro-posed cost-sensitive support vector machines (CSSVM)

Table 1 lists the statistics of data sets KDD99 WebSpam Covertype MNIST SIAM1 and SIAM11 are obtainedfrom the following data website KDD99 is at httpkddicsuciedudatabaseskddcup99kddcup99html Web Spam isat httpwwwccgatecheduprojectsdoiWebbSpamCorpushtml Covertype is at httparchiveicsuciedumldatasetsCovertype MNIST is at httpyannlecuncomexdbmnistSIAM is at httpsc3nasagovdashlinkresources138

To evaluate the performance of CSVM-RDCD methodwe use a stratified selection to split each dataset to 910training and 110 testing We briefly describe each set belowFor each dataset we choose the class with the higher cost orfewer data points as the target or positive class All multiclassdatasets were converted to binary data sets In particular thebinary datasets SIAM1 and SIAM2 are datasets which havebeen constructed from the same multiclass dataset but withdifferent target class and different imbalance ratios

Abstract and Applied Analysis 7

Table 2 Cost-sensitive parameters of three algorithms using CSVMon 6 datasets

Dataset 119862120574 119862(1 minus 120574)

KDD99 (intrusion detection) 4 1Web span 2 1MNIST 10 1Covertype 211 1SIAM1 2000 1SIAM11 716 1

Evaluation of the performance of the three algorithmsusing CSVM was 10-fold cross-validation tested averagemisclassification cost the training time and the recognitionrate of the positive class (ie the recognition rate of theminority class)

Average misclassification cost (AMC) represents 10-foldcross-validation averagemisclassification cost on test datasetsfor related CSVMs described as follows

AMC =119891119901 times 119862120574 + 119891119899 times 119862 (1 minus 120574)

119873 (55)

where 119891119901 and 119891119899 represent the number of the positiveexamples misclassified as the negative examples in the testdataset and the number of the negative examplesmisclassifiedas the positive examples in the test data set respectively119862120574 and 119862(1 minus 120574) denote the cost of the positive examplesmisclassified as the negative examples in the test data setand the cost of the negative examples misclassified as thepositive examples in the test data set respectively119873 denotesthe number of test examples

The recognition rate of the positive class is the numberof classified positive classes and the number of the positiveclasses on testing dataset

Training time in seconds is used to evaluate the conver-gence speeding of three algorithms using CSVM on the samecomputer

Cost-sensitive parameters of three algorithms usingCSVM are specified as the following Table 2 Cost-sensitiveparameters 119862120574 and 119862(1 minus 120574) are valued according to the classratio of datasets namely the class ratio of minority class andmajority class

Three datasets with relative class imbalance are exam-ined Namely KDD99 (intrusion detection) Web span andMNIST datasets are considered Three datasets with severeclass imbalance are examined Namely Covertype SIAM1and SIAM11 datasets are considered The average misclassi-fication cost comparison of three algorithms using CSVMis shown in Figure 1 for each of the datasets The CSVM-RDCD algorithm outperforms the PCSVM and CSVM on alldatasets

The recognition rate of positive class comparison of threealgorithms using CSVM is shown in Figure 2 for each ofthe datasets The CSVM-RDCD algorithm outperforms thePCSVM and CSSVM on all datasets surpasses the PCSVMon four datasets and ties with the PCSVM on two datasets

We examine large datasets with relative imbalance ratiosand severe imbalance ratios to evaluate the convergence

1 2 3 4 5 6003

0035

004

0045

005

0055

006

0065

007

0075

AM

C

AMC

CSVM-RDCDPCSVMCSSVM

Datasets

Figure 1 Average misclassification cost comparison of three algo-rithms using CSVM on 6 datasets

1 2 3 4 5 60972

0974

0976

0978

098

0982

0984

0986

0988

099

0992

Datasets

Reco

gniti

on ra

te o

f pos

itive

clas

s

Recognition rate of positive class

CSVM-RDCDPCSVMCSSVM

Figure 2 The positive class recognition rate comparison of threealgorithms using CSVM

speed of CSVM-RDCD algorithm The training time com-parison of three algorithms using CSVM is shown in Figure 3for each of the datasetsTheCSVM-RDCD algorithm outper-forms the PCSVM and CSSVM on all datasets

52 Experiment on Real-World Big Class-Imbalanced DatasetClassification Problems In order to verify the effectivenessof the proposed algorithm CSVM-RDCD on real-worldbig class-imbalanced data classification problems it was

8 Abstract and Applied Analysis

1 2 3 4 5 60

5

10

15

20

25

30

Trai

ning

tim

e (s)

Training time

Datasets

CSVM-RDCDPCSVMCSSVM

Figure 3 Training time (s) comparison of three algorithms usingCSSVM

Table 3 Specification of the benchmark datasets and cost-sensitiveparameters on the dataset

Dataset Fault diagnosis of wind turbineNumber of ex 20000Number of feat 56Ratio 190Positive class Local crack fault119862120574 190119862(1 minus 120574) 1

evaluated using the real vibration data measured in the windturbine The experimental data were from the SKF Wind-Con software and collected from a wind turbine gearboxtype TF138-A [11] The vibration signals were continuouslyacquired by an accelerometer mounted on the outer case ofthe gearbox

All parameter settings for the dataset are listed in Table 3The statistical results of the big class-imbalanced data prob-lems that measure the quality of results (average misclassi-fication cost recognition rate of positive class and trainingtime) are listed in Table 4 From Table 4 it can be concludedthat CSVM-RDCD is able to consistently achieve superiorperformance in the big class-imbalanced data classificationproblems

Experimental results show that it is applicable to solvecost-sensitive SVM dual problem using randomized dualcoordinate descent method on the large-scale experimentaldata sets The proposed method can achieve superior perfor-mance in the average misclassification cost recognition rateof positive class and training time Large-scale experimentaldata sets show that cost-sensitive support vector machinesusing randomized dual coordinate descent method run

Table 4 Classification performance of three algorithms usingCSVM on the dataset

Name ofalgorithms AMC Recognition of

positive class Training time

CSVM-RDCD 001825 985 543 sPCSVM 00334 976 615 sCSSVM 00379 972 734 s

more efficiently than both PCSVM and CSSVM especiallyrandomized dual coordinate descent algorithmhas advantageof training time on large-scale data sets CSSVM needs tobuild complex whole gradient and kernel matrix 119876 andneeds to select the set of complex work in solving process ofdecomposition algorithm Decomposition algorithm updatesfull uniform gradient information as a whole the computa-tional complexity 119874(119897119899) for the full gradient update PCSVMalso has similar computational complexity Randomized dualcoordinate gradient method updates linearly the coordinatesits computational complexity as 119874(119899) which increases con-siderably the convergence speed of the proposed method

6 Conclusions

Randomized dual coordinate descentmethod (RDCD) is theoptimization algorithm to update the global solution which isobtained by solving an analytical solution of the suboptimalproblemThe RDCDmethod has the rapid convergence ratewhich is mainly due to the following (1) the subproblemhas formal analytical solution which is solved in solvingprocess without complex numerical optimization (2) thenext component of RDCD method in solving process isupdated on the basis of a previous component comparedwith the full gradient information updated CSSVM methodas a whole the objective function of RDCD method candecline faster (3) the single coordinate gradient calculationof RDCDmethod is simpler and easier than the full gradientcalculation

Randomized dual coordinate descent method is appliedto cost-sensitive support vector machine which expandedthe scope of application of the randomized dual coordinatedescent method For large-scale class-imbalanced problema cost-sensitive SVM using randomized dual coordinatedescent method is proposed Experimental results and anal-ysis show the effectiveness and feasibility of the proposedmethod

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the National ScienceFund for Distinguished Young Scholars of China (no61025015) the National Natural Science Foundation of China

Abstract and Applied Analysis 9

(nos 51305046 and 61304019) the Research Foundation ofEducation Bureau of Hunan Province China (no 12A007)Open Fund of Hunan Province University Key Laboratoryof Bridge Engineering (Changsha University of Scienceand Technology) the Key Laboratory of Renewable EnergyElectric-Technology of Hunan Province (Changsha Univer-sity of Science and Technology) the Key Laboratory ofEfficient and Clean Energy Utilization College of HunanProvince and the Introduction of Talent Fund of ChangshaUniversity of Science and Technology

References

[1] M A Davenport ldquoThe 2nu-SVM a cost-sensitive extension ofthe nu-SVMrdquo Tech Rep TREE 0504 Department of Electricaland Computer Engineering Rice University 2005

[2] M Kim ldquoLarge margin cost-sensitive learning of conditionalrandom fieldsrdquo Pattern Recognition vol 43 no 10 pp 3683ndash3692 2010

[3] Y-J Park S-H Chun and B-C Kim ldquoCost-sensitive case-based reasoning using a genetic algorithm application tomedical diagnosisrdquoArtificial Intelligence inMedicine vol 51 no2 pp 133ndash145 2011

[4] J Kim K Choi G Kim and Y Suh ldquoClassification costan empirical comparison among traditional classifier Cost-Sensitive Classifier and MetaCostrdquo Expert Systems with Appli-cations vol 39 no 4 pp 4013ndash4019 2012

[5] C-Y Yang J-S Yang and J-J Wang ldquoMargin calibration inSVM class-imbalanced learningrdquo Neurocomputing vol 73 no1ndash3 pp 397ndash411 2009

[6] H Masnadi-Shirazi N Vasconcelos and A Iranmehr ldquoCost-sensitive supportvector machinesrdquo httparxivorgabs12120975

[7] Y Artan M A Haider D L Langer et al ldquoProstate cancerlocalization with multispectral MRI using cost-sensitive sup-port vector machines and conditional random fieldsrdquo IEEETransactions on Image Processing vol 19 no 9 pp 2444ndash24552010

[8] C-J Hsieh K-W Chang C-J Lin S S Keerthi and SSundararajan ldquoA dual coordinate descent method for large-scale linear SVMrdquo in Proceedings of the 25th InternationalConference on Machine Learning pp 408ndash415 July 2008

[9] R-E Fan K-W Chang C-J Hsieh X-R Wang and C-JLin ldquoLIBLINEAR a library for large linear classificationrdquo TheJournal of Machine Learning Research vol 9 pp 1871ndash18742008

[10] T Eitrich and B Lang ldquoParallel cost-sensitive support vectormachine software for classificationrdquo in Proceedings of the Work-shop from Computational Biophysics to Systems Biology NICSeries pp 141ndash144 John vonNeumann Institute forComputingJulich Germany 2006

[11] B Tang W Liu and T Song ldquoWind turbine fault diagnosisbased on Morlet wavelet transformation and Wigner-Villedistributionrdquo Renewable Energy vol 35 no 12 pp 2862ndash28662010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Cost-Sensitive Support Vector Machine Using …downloads.hindawi.com/journals/aaa/2014/416591.pdf · 2019-07-31 · Research Article Cost-Sensitive Support Vector

Abstract and Applied Analysis 3

3 Cost-Sensitive Support VectorMachine Using Randomized DualCoordinate Descent Method

In this section randomized dual coordinate descent methodis used to solve 2C-SVM that is one version of cost-sensitivesupport vectormachineTheoptimization process starts froman initial point 1205720 isin R119897 and generates a sequence of vectors120572119896 119896 = 0 infin The process from 120572119896 to 120572119896+1 is called

an outer iteration In each outer iteration 119897 is called inneriteration so that 120572

1 1205722 120572

119897are sequentially updated Each

outer iteration thus generates vectors120572119896119894 isin R119897 119894 = 1 2 119897+1 such that

1205721198961= 120572119896 119894 = 1

120572119896119894= [120572119896+1

1 120572

119896+1

119894minus1 120572119896

119894 120572

119896

119897] forall119894 = 2 119897

120572119896119897= 120572119896 119894 = 119897 + 1

(12)

For updating 120572119896119894 to 120572119896119894+1 the following one-variablesubproblem is solved as

min119889

119891 (120572119896119894+ 119889119890119894) (13)

where 119890119894= [0 1 0]

119879 This is general process ofone-variable coordinate descent method However the sumconstrained condition of the optimization problem (13) isdenoted as follows

119899

sum119894=1

120572119894119910119894= 0 (14)

where 120572119894is exactly determined by the other 120572

119894s and if we

were to hold 120572119894119899

119894=1 120572119894fixed then we cannot make any

change 120572119894without violating the constraint condition (14) in

the optimization problemThus if we want to update some subject of the 120572

119894s we

must update at least two of them simultaneously in orderto keep satisfying the constraints We currently have somesetting of the 120572

119894s that satisfy the constraint conditions (14)

and suppose we have decided to hold 120572119894119899

119894=1 120572119894 120572119895fixed and

reoptimize the dual problem of CSVM (10) with respect to 120572119894

120572119895(subject to the constraints) Equation (6) is reformatted as

120572119894119910119894+ 120572119895119910119895=

119897

sum119898=1

120572119898119910119898minus 120572119894119910119894minus 120572119895119910119895 (15)

Since the right hand side is fixed we can just let it bedenoted by some constant 120577

120577 =

119897

sum119898=1

120572119898119910119898minus 120572119894119910119894minus 120572119895119910119895 (16)

We can form the updated coordinates before and afterwith respect to 120572

119894 120572119895

120572119896

119894119910119894+ 120572119896

119895119910119895= 120572119896+1

119894119910119894+ 120572119896+1

119895119910119895= 120577 (17)

120572119896

119894119910119894 120572119896+1119894119910119894are updated coordinates before and after with

respect to 120572119896119894 Consider the following dual two-variable

subproblem of 2C-SVM

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895)

= 1198761198941198941198892

119894+ 2(120572

119896119894119876)119894minus 1 119889

119894

+ 1198761198951198951198892

119895+ 2(120572

119896119895119876)119895minus 1 119889

119895+ 119888

(18)

where 119888 is constant 119876119894= [119876119894119904 119876

119894119904minus1 119876119894119904 119876

119894119897] and

(120572119896119894119876)119894=

119897

sum119904=1

119876119894119904120572119896119904

119894 (19)

From (18) we obtain

1198911015840(120572) = 120572

119879119876 minus 1 (20)

120597119891 (120572119896119894)

120597120572119894

= 2(120572119896119894119876)119894minus 1 (21)

where 120597119891(120572119896119894)120597120572119894is the 119894th component of the gradient1198911015840(120572)

By incorporating (21) (18) is rewritten as

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895)

= 1198761198941198941198892

119894+120597119891 (120572119896119894)

120597120572119894

119889119894+ 1198761198951198951198892

119895+120597119891 (120572119896119895)

120597120572119895

119889119895+ 119888

(22)

From (17) we have 119889119895= minus119889

119894119910119894119910119895 Equation (22) is

reformed as follows

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895)

= 1198761198941198941198892

119894+120597119891 (120572119896119894)

120597120572119894

119889119894+ 1198761198951198951198892

119894minus120597119891 (120572119896119895)

120597120572119895

119889119894119910119894119910119895+ 119888

= (119876119894119894+ 119876119895119895) 1198892

119894+ (

120597119891 (120572119896119894)

120597120572119894

minus120597119891 (120572119896119895)

120597120572119895

119910119894119910119895)119889119894+ 119888

(23)

Treating (119876119894119894+119876119895119895) ((120597119891(120572119896119894)120597120572

119894) minus (120597119891(120572

119896119895)120597120572119895)119910119894119910119895)

119888 as constants we should be able to verify that this is justsome quadratic function in 119889

119894 We can easily maximize

this quadratic function by setting its derivative to zero andsolving the optimization problemThe following two-variableoptimization problem is obtained as

min119889119894

119891 (120572119896119894+ 119889119894119890119894) + 119891 (120572

119896119895+ 119889119895119890119895) (24)

4 Abstract and Applied Analysis

The closed form is derived as follows

119889119894=(120597119891 (120572

119896119894) 120597120572119894) minus (120597119891 (120572

119896119895) 120597120572119895) (119910119894119910119895)

2 (119876119894119894+ 119876119895119895)

(25)

We now consider the box constraints (10) of the two-variable optimization problem The box constraints as 0 le120572119894le 119862120574 119894 isin 119868

+ 0 le 120572

119894le 119862(1 minus 120574) 119894 isin 119868

minusare classified

to four boxes constraints according to the labels of examplesfrom some two coordinates

Firstly suppose that the labels of examples are as

119910119894= +1 119910

119895= minus1 (26)

The sum constraints of Lagrange multipliers according tothese labels are as

120572119894119910119894+ 120572119895119910119895= 120577 (27)

Another expressing of sum constraints of Lagrange mul-tipliers is as

120572119894minus 120572119895= 120577

or 120572119894= 120572119895+ 120577

(28)

The box constraints of Lagrangemultipliers are defined as

0 le 120572119894le 120574119862

0 le 120572119895le (1 minus 120574)119862

(29)

We obtain the new expressing of box constraints ofLagrange multipliers as the following

0 le 120577 le (1 minus 120574)119862

0 le 120572119894le 120572119894minus 120572119895minus (1 minus 120574)119862

or minus (1 minus 120574)119862 le 120577 le 0

120572119894minus 120572119895le 120572119894le 120574119862

(30)

Thus we obtain stricter box constraints of Lagrangemultipliers 120572

119894 120572119895according to 119910

119894= +1 119910

119895= minus1 as the

following

1198801= max (0 120572

119894minus 120572119895)

1198811= min (120572

119894minus 120572119895minus (1 minus 120574)119862 120574119862)

(31)

Secondly suppose that the labels of examples are as

119910119894= +1 119910

119895= +1 (32)

Similarly similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198802= max (0 120572

119894+ 120572119895minus 120574119862)

1198812= min (120572

119894+ 120572119895 120574119862)

(33)

Thirdly suppose that the labels of examples are as

119910119894= minus1 119910

119895= minus1 (34)

Similarly the similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198803= max (0 minus120572

119894minus 120572119895minus (1 minus 120574)119862)

1198813= min (minus120572

119894minus 120572119895 (1 minus 120574)119862)

(35)

Finally suppose that the labels of examples are as

119910119894= minus1 119910

119895= +1 (36)

Similarly the similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198804= max (0 minus120572

119894minus 120572119895)

1198814= min (120574119862 minus 120572

119894minus 120572119895 (1 minus 120574) 119862)

(37)

For the simplification set

120572temp119894

= 120572119896119894

119894+ 119889119894 (38)

120572119896119894+1

119894is defined as the temp solution which would be edited

to satisfy

120572119896119894+1

119894=

119881119894 if 120572temp

119894gt 119881119894 119894 = 1 2 3 4

120572temp119894 if 119880

119894lt 120572

temp119894

lt 119881119894 119894 = 1 2 3 4

119880119894 if 120572temp

119894lt 119880119894 119894 = 1 2 3 4

(39)

From linear constraints with respect to 120572119896119894+1119894

in (17) thevalue of 120572119896119895+1

119894is obtained as

120572119896119895+1

119894= 119910119895(119910119894120572119896119894

119894+ 119910119895120572119896119895

119894minus 119910119894120572119896119894+1

119894) (40)

4 The Modified Proposed Method

41 Two Speeding-Up Strategies To avoid duplicate andinvalid iterative process calculations the two given condi-tions are not updated If one of two conditions ismet then thealgorithm skips this iteration that can significantly reduce thecomputational amount and accelerate the convergence speed

Condition 1 If 120572119896119894119894= 120572119896119895

119894= 0 or 120572119896119894

119894= 120572119896119895

119894= 119862 then the

constrained conditions are not updated

Abstract and Applied Analysis 5

Due to the box constraints of (10) there will appear a lotof boundary points of coordinates (120572119896119894

119894= 119862 or 120572119896119894

119894= 0) in

computing process If there are two coordinates (120572119896119894119894and 120572119896119894119894)

as the value of 0 or 119862 in an iteration process the analyticalsolution of the two subvariables updates coordinates withoutcalculating The reason is that the formula (17) guarantees120572119896119894

119894+ 120572119896119895

119894= 0 or 120572119896119894

119894+ 120572119896119895

119894= 2119862 while double restricted

box constrained optimization 0 le 120572119896119894119894le 119862 if the result is 0 or

119862 Constrained conditions will be edited ultimately as 0 or 119862The constrained conditions are not updated

Condition 2 If projected gradient is 0 the constrainedconditions are not updated

42 Dual Problem of Cost-Sensitive Support Vector MachineUsing Randomized Dual Coordinate Descent Method and ItsComplexity Analysis From the above algorithmderivation ofview solving of CSVM seems to have been successful but thecomputational complexity of the solving process is also largerAssume that the average value of nonzero feature of eachsample is 119899 Firstly the computational complexity of the innerproduct matrix 119876

119894119894= 119910119894119909119879

119894119909119894119910119894 119894 = 1 2 119897 is 119874(119897119899) but the

process can be operated in advance and stored into memorySecondly the calculation of (120572119896119894119876)

119894= sum119897

119904=1119876119894119904120572119896119904

119894takes the

computational complexity 119874(119897119899) The amount of calculationis very great when the data size is large However there is alinear relationship model CSVM

119908 =

119897

sum119904=1

120572119896119904

119894119909119904119910119904 (41)

Thus (120572119896119894119876)119894is further simplified as follows

(120572119896119894119876)119894=

119897

sum119904=1

119876119894119904120572119896119904

119894=

119897

sum119904=1

(120572119896119904

119894119910119904119909119879

119904) 119909119904119910119904= 119908119879119909119894119910119894 (42)

Solving the corresponding formula (23) can be simplifiedas follows

119889119894=2(120572119896119894119876)119894minus 1 minus (2(120572

119896119895119876)119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

=2119908119879119909119894119910119894minus 1 minus (2119908

119879119909119895119910119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

(43)

As can be seen computing119889119894with the complexity of119874(119897119899)

becomes computing 119908119879119909119894119910119894with the complexity of 119874(119899)

Thus the calculation times reduce 119897 minus 1 However

119908 =

119897

sum119904=1

120572119896119904

119894119909119904119910119904 (44)

where 119908 is the updated still computational complexity119874(119897119899)The amount of calculation can be reduced significantly when119908 is updated by changing 120572119896119904

119894 Let 120572119896119894

119894 120572119896119895119894

be the values of

the current selection 120572119896119894+1119894

120572119896119895+1119894

are updated values whichcan be updated via a simple way

119908 larr997888 119908 + (120572119896119894

119894minus 120572119896119894+1

119894) 119909119894119910119894+ (120572119896119895

119894minus 120572119896119895+1

119894) 119909119895119910119895 (45)

Its computational complexity only is 119874(119899) So whethercalculating 119889

119894or updating 119908 coordinate gradient compu-

tation complexity is 119874(119899) which is one of the coordinategradient method rapid convergence speed reasons

When assigned an initial value and constraints based on

119899

sum119894=1

120572119894119910119894= 0

0 le 120572119894le 120574119862

0 le 120572119895le (1 minus 120574)119862

(46)

set the initial point

1205720= [ 120572

119894 120572

119895 ]

119879

1times119897 (47)

where 120572119894= 1119905 120572

119895= (1(119897 minus 119905))119897 119905 are the total number

of samples and the number of positive samples respectivelyThus the weight vector 119908 of the original problem is obtainedby optimizing Lagrangian multipliers 120572 of the dual problem

43 Description of Cost-Sensitive Support Vector MachineUsing Randomized Dual Coordinate Descent Method Accel-erated conditions are judged by Section 41 One chooses thecoordinate optimized number 119894 119895 We use formula (43) tocalculate119889

119894 formulas (39) and (40) to update120572119896119894

119894120572119896119895119894 and the

formula (45) to update 119908 respectively We can see that inneriteration takes 119874(119899) effort The computer memory is mainlyused to store samples information 119909

1 1199092 119909

119897and each

sample point and their inner products 119876119894119894= 119910119894119909119879

119894119909119894119910119894 Cost-

sensitive support vector machine using dual randomizedcoordinate gradient descent algorithm is described as follows

Algorithm 3 Cost-sensitive support vector machine usingrandomized dual coordinate descent algorithm (CSVM-RDCD)

Input sample information (119909119894 119910119894)119897

119894=1 1205720

Output 120572119879119908

Initialize 1205720 and the corresponding 119908 = sum119897119894=1120572119894119909119894119910119894

For 119896 = 1 2 119879 do

1205721198961larr997888 120572

119896 (48)

Step 1 Randomly choose 119894 119895 isin 1 2 119897 119894 = 119895 and thecorresponding 119910

119894 119910119895

6 Abstract and Applied Analysis

Table 1 Specification of the benchmark datasets Number of ex is the number of example data points Number of feat is the number offeatures Ratio is the class imbalance ratio Target specifies the target or positive class Number is the number of the datasets

Dataset Number of ex Number of feat Ratio Positive class NumberKDD99 (intrusion detection) 5209460 42 1 4 Normal 1Web span 350000 254 1 2 minus1 2MNIST 70000 780 1 10 5 3Covertype 581012 54 1 211 CottonwoodWillow(4) 4SIAM1 28596 30438 1 2000 1 6 7 11 5SIAM11 28596 30438 1 716 11 12 6

Step 2

If (119910119894= +1 119910

119895= minus1)

1198801= max (0 120572

119894minus 120572119895)

1198811= min (120572

119894minus 120572119895minus (1 minus 120574)119862 120574119862)

if (119910119894= +1 119910

119895= +1)

1198802= max (0 120572

119894+ 120572119895minus 120574119862)

1198812= min (120572

119894+ 120572119895 120574119862)

if (119910119894= minus1 119910

119895= minus1)

1198803= max (0 minus120572

119894minus 120572119895minus (1 minus 120574)119862)

1198813= min (minus120572

119894minus 120572119895 (1 minus 120574) 119862)

if (119910119894= minus1 119910

119895= +1)

1198804= max (0 minus120572

119894minus 120572119895)

1198814= min (120574119862 minus 120572

119894minus 120572119895 (1 minus 120574) 119862)

(49)

Step 3

120572temp119894

= 120572119896119894

119894+2119908119879119909119894119910119894minus 1 minus (2119908

119879119909119895119910119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

(50)

Step 4

if (119910119894= 1 119910

119895= minus1)119898 = 1

if (119910119894= 1 119910

119895= 1)119898 = 2

if (119910119894= minus1 119910

119895= minus1)119898 = 3

119891 (119910119894= minus1 119910

119895= +1)119898 = 4

120572119896119894+1

119894=

119881119898 if 120572temp

119894gt 119881119898

120572temp119894 if 119880

119898lt 120572

temp119894

lt 119881119898

119880119898 if 120572temp

119894lt 119880119898

(51)

Step 5

120572119896119895+1

119894= 119910119895119910119894120572119896119894

119894+ 119910119895120572119896119895

119895minus 119910119894120572119896119894+1

119894 (52)

Step 6

119908 larr997888 119908 + (120572119896119894

119894minus 120572119896119894+1

119894) 119909119894119910119894+ (120572119896119895

119894minus 120572119896119895+1

119894) 119909119895119910119895 (53)

Until A stopping condition is satisfied

120572119896+1larr997888 120572

119896 (54)

End for

5 Experiments and Analysis

51 Experiment on Big Class-Imbalanced Benchmark DatasetsClassification Problem In this section we analyze the perfor-mance of the proposed cost-sensitive support vectormachineusing randomized dual coordinate descent method (CSVM-RDCD) We compare our implementation with the state-of-the-art cost-sensitive support vector Three implementationsof related cost-sensitive SVMs are compared We proposedthe cost-sensitive SVM using randomized dual coordinatedescent method (CSVM-RDCD) by modifying the LibSVM[8 9] source code Eitrich and Lang [10] proposed parallelcost-sensitive support vector machine (PCSVM) [6] pro-posed cost-sensitive support vector machines (CSSVM)

Table 1 lists the statistics of data sets KDD99 WebSpam Covertype MNIST SIAM1 and SIAM11 are obtainedfrom the following data website KDD99 is at httpkddicsuciedudatabaseskddcup99kddcup99html Web Spam isat httpwwwccgatecheduprojectsdoiWebbSpamCorpushtml Covertype is at httparchiveicsuciedumldatasetsCovertype MNIST is at httpyannlecuncomexdbmnistSIAM is at httpsc3nasagovdashlinkresources138

To evaluate the performance of CSVM-RDCD methodwe use a stratified selection to split each dataset to 910training and 110 testing We briefly describe each set belowFor each dataset we choose the class with the higher cost orfewer data points as the target or positive class All multiclassdatasets were converted to binary data sets In particular thebinary datasets SIAM1 and SIAM2 are datasets which havebeen constructed from the same multiclass dataset but withdifferent target class and different imbalance ratios

Abstract and Applied Analysis 7

Table 2 Cost-sensitive parameters of three algorithms using CSVMon 6 datasets

Dataset 119862120574 119862(1 minus 120574)

KDD99 (intrusion detection) 4 1Web span 2 1MNIST 10 1Covertype 211 1SIAM1 2000 1SIAM11 716 1

Evaluation of the performance of the three algorithmsusing CSVM was 10-fold cross-validation tested averagemisclassification cost the training time and the recognitionrate of the positive class (ie the recognition rate of theminority class)

Average misclassification cost (AMC) represents 10-foldcross-validation averagemisclassification cost on test datasetsfor related CSVMs described as follows

AMC =119891119901 times 119862120574 + 119891119899 times 119862 (1 minus 120574)

119873 (55)

where 119891119901 and 119891119899 represent the number of the positiveexamples misclassified as the negative examples in the testdataset and the number of the negative examplesmisclassifiedas the positive examples in the test data set respectively119862120574 and 119862(1 minus 120574) denote the cost of the positive examplesmisclassified as the negative examples in the test data setand the cost of the negative examples misclassified as thepositive examples in the test data set respectively119873 denotesthe number of test examples

The recognition rate of the positive class is the numberof classified positive classes and the number of the positiveclasses on testing dataset

Training time in seconds is used to evaluate the conver-gence speeding of three algorithms using CSVM on the samecomputer

Cost-sensitive parameters of three algorithms usingCSVM are specified as the following Table 2 Cost-sensitiveparameters 119862120574 and 119862(1 minus 120574) are valued according to the classratio of datasets namely the class ratio of minority class andmajority class

Three datasets with relative class imbalance are exam-ined Namely KDD99 (intrusion detection) Web span andMNIST datasets are considered Three datasets with severeclass imbalance are examined Namely Covertype SIAM1and SIAM11 datasets are considered The average misclassi-fication cost comparison of three algorithms using CSVMis shown in Figure 1 for each of the datasets The CSVM-RDCD algorithm outperforms the PCSVM and CSVM on alldatasets

The recognition rate of positive class comparison of threealgorithms using CSVM is shown in Figure 2 for each ofthe datasets The CSVM-RDCD algorithm outperforms thePCSVM and CSSVM on all datasets surpasses the PCSVMon four datasets and ties with the PCSVM on two datasets

We examine large datasets with relative imbalance ratiosand severe imbalance ratios to evaluate the convergence

1 2 3 4 5 6003

0035

004

0045

005

0055

006

0065

007

0075

AM

C

AMC

CSVM-RDCDPCSVMCSSVM

Datasets

Figure 1 Average misclassification cost comparison of three algo-rithms using CSVM on 6 datasets

1 2 3 4 5 60972

0974

0976

0978

098

0982

0984

0986

0988

099

0992

Datasets

Reco

gniti

on ra

te o

f pos

itive

clas

s

Recognition rate of positive class

CSVM-RDCDPCSVMCSSVM

Figure 2 The positive class recognition rate comparison of threealgorithms using CSVM

speed of CSVM-RDCD algorithm The training time com-parison of three algorithms using CSVM is shown in Figure 3for each of the datasetsTheCSVM-RDCD algorithm outper-forms the PCSVM and CSSVM on all datasets

52 Experiment on Real-World Big Class-Imbalanced DatasetClassification Problems In order to verify the effectivenessof the proposed algorithm CSVM-RDCD on real-worldbig class-imbalanced data classification problems it was

8 Abstract and Applied Analysis

1 2 3 4 5 60

5

10

15

20

25

30

Trai

ning

tim

e (s)

Training time

Datasets

CSVM-RDCDPCSVMCSSVM

Figure 3 Training time (s) comparison of three algorithms usingCSSVM

Table 3 Specification of the benchmark datasets and cost-sensitiveparameters on the dataset

Dataset Fault diagnosis of wind turbineNumber of ex 20000Number of feat 56Ratio 190Positive class Local crack fault119862120574 190119862(1 minus 120574) 1

evaluated using the real vibration data measured in the windturbine The experimental data were from the SKF Wind-Con software and collected from a wind turbine gearboxtype TF138-A [11] The vibration signals were continuouslyacquired by an accelerometer mounted on the outer case ofthe gearbox

All parameter settings for the dataset are listed in Table 3The statistical results of the big class-imbalanced data prob-lems that measure the quality of results (average misclassi-fication cost recognition rate of positive class and trainingtime) are listed in Table 4 From Table 4 it can be concludedthat CSVM-RDCD is able to consistently achieve superiorperformance in the big class-imbalanced data classificationproblems

Experimental results show that it is applicable to solvecost-sensitive SVM dual problem using randomized dualcoordinate descent method on the large-scale experimentaldata sets The proposed method can achieve superior perfor-mance in the average misclassification cost recognition rateof positive class and training time Large-scale experimentaldata sets show that cost-sensitive support vector machinesusing randomized dual coordinate descent method run

Table 4 Classification performance of three algorithms usingCSVM on the dataset

Name ofalgorithms AMC Recognition of

positive class Training time

CSVM-RDCD 001825 985 543 sPCSVM 00334 976 615 sCSSVM 00379 972 734 s

more efficiently than both PCSVM and CSSVM especiallyrandomized dual coordinate descent algorithmhas advantageof training time on large-scale data sets CSSVM needs tobuild complex whole gradient and kernel matrix 119876 andneeds to select the set of complex work in solving process ofdecomposition algorithm Decomposition algorithm updatesfull uniform gradient information as a whole the computa-tional complexity 119874(119897119899) for the full gradient update PCSVMalso has similar computational complexity Randomized dualcoordinate gradient method updates linearly the coordinatesits computational complexity as 119874(119899) which increases con-siderably the convergence speed of the proposed method

6 Conclusions

Randomized dual coordinate descentmethod (RDCD) is theoptimization algorithm to update the global solution which isobtained by solving an analytical solution of the suboptimalproblemThe RDCDmethod has the rapid convergence ratewhich is mainly due to the following (1) the subproblemhas formal analytical solution which is solved in solvingprocess without complex numerical optimization (2) thenext component of RDCD method in solving process isupdated on the basis of a previous component comparedwith the full gradient information updated CSSVM methodas a whole the objective function of RDCD method candecline faster (3) the single coordinate gradient calculationof RDCDmethod is simpler and easier than the full gradientcalculation

Randomized dual coordinate descent method is appliedto cost-sensitive support vector machine which expandedthe scope of application of the randomized dual coordinatedescent method For large-scale class-imbalanced problema cost-sensitive SVM using randomized dual coordinatedescent method is proposed Experimental results and anal-ysis show the effectiveness and feasibility of the proposedmethod

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the National ScienceFund for Distinguished Young Scholars of China (no61025015) the National Natural Science Foundation of China

Abstract and Applied Analysis 9

(nos 51305046 and 61304019) the Research Foundation ofEducation Bureau of Hunan Province China (no 12A007)Open Fund of Hunan Province University Key Laboratoryof Bridge Engineering (Changsha University of Scienceand Technology) the Key Laboratory of Renewable EnergyElectric-Technology of Hunan Province (Changsha Univer-sity of Science and Technology) the Key Laboratory ofEfficient and Clean Energy Utilization College of HunanProvince and the Introduction of Talent Fund of ChangshaUniversity of Science and Technology

References

[1] M A Davenport ldquoThe 2nu-SVM a cost-sensitive extension ofthe nu-SVMrdquo Tech Rep TREE 0504 Department of Electricaland Computer Engineering Rice University 2005

[2] M Kim ldquoLarge margin cost-sensitive learning of conditionalrandom fieldsrdquo Pattern Recognition vol 43 no 10 pp 3683ndash3692 2010

[3] Y-J Park S-H Chun and B-C Kim ldquoCost-sensitive case-based reasoning using a genetic algorithm application tomedical diagnosisrdquoArtificial Intelligence inMedicine vol 51 no2 pp 133ndash145 2011

[4] J Kim K Choi G Kim and Y Suh ldquoClassification costan empirical comparison among traditional classifier Cost-Sensitive Classifier and MetaCostrdquo Expert Systems with Appli-cations vol 39 no 4 pp 4013ndash4019 2012

[5] C-Y Yang J-S Yang and J-J Wang ldquoMargin calibration inSVM class-imbalanced learningrdquo Neurocomputing vol 73 no1ndash3 pp 397ndash411 2009

[6] H Masnadi-Shirazi N Vasconcelos and A Iranmehr ldquoCost-sensitive supportvector machinesrdquo httparxivorgabs12120975

[7] Y Artan M A Haider D L Langer et al ldquoProstate cancerlocalization with multispectral MRI using cost-sensitive sup-port vector machines and conditional random fieldsrdquo IEEETransactions on Image Processing vol 19 no 9 pp 2444ndash24552010

[8] C-J Hsieh K-W Chang C-J Lin S S Keerthi and SSundararajan ldquoA dual coordinate descent method for large-scale linear SVMrdquo in Proceedings of the 25th InternationalConference on Machine Learning pp 408ndash415 July 2008

[9] R-E Fan K-W Chang C-J Hsieh X-R Wang and C-JLin ldquoLIBLINEAR a library for large linear classificationrdquo TheJournal of Machine Learning Research vol 9 pp 1871ndash18742008

[10] T Eitrich and B Lang ldquoParallel cost-sensitive support vectormachine software for classificationrdquo in Proceedings of the Work-shop from Computational Biophysics to Systems Biology NICSeries pp 141ndash144 John vonNeumann Institute forComputingJulich Germany 2006

[11] B Tang W Liu and T Song ldquoWind turbine fault diagnosisbased on Morlet wavelet transformation and Wigner-Villedistributionrdquo Renewable Energy vol 35 no 12 pp 2862ndash28662010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Cost-Sensitive Support Vector Machine Using …downloads.hindawi.com/journals/aaa/2014/416591.pdf · 2019-07-31 · Research Article Cost-Sensitive Support Vector

4 Abstract and Applied Analysis

The closed form is derived as follows

119889119894=(120597119891 (120572

119896119894) 120597120572119894) minus (120597119891 (120572

119896119895) 120597120572119895) (119910119894119910119895)

2 (119876119894119894+ 119876119895119895)

(25)

We now consider the box constraints (10) of the two-variable optimization problem The box constraints as 0 le120572119894le 119862120574 119894 isin 119868

+ 0 le 120572

119894le 119862(1 minus 120574) 119894 isin 119868

minusare classified

to four boxes constraints according to the labels of examplesfrom some two coordinates

Firstly suppose that the labels of examples are as

119910119894= +1 119910

119895= minus1 (26)

The sum constraints of Lagrange multipliers according tothese labels are as

120572119894119910119894+ 120572119895119910119895= 120577 (27)

Another expressing of sum constraints of Lagrange mul-tipliers is as

120572119894minus 120572119895= 120577

or 120572119894= 120572119895+ 120577

(28)

The box constraints of Lagrangemultipliers are defined as

0 le 120572119894le 120574119862

0 le 120572119895le (1 minus 120574)119862

(29)

We obtain the new expressing of box constraints ofLagrange multipliers as the following

0 le 120577 le (1 minus 120574)119862

0 le 120572119894le 120572119894minus 120572119895minus (1 minus 120574)119862

or minus (1 minus 120574)119862 le 120577 le 0

120572119894minus 120572119895le 120572119894le 120574119862

(30)

Thus we obtain stricter box constraints of Lagrangemultipliers 120572

119894 120572119895according to 119910

119894= +1 119910

119895= minus1 as the

following

1198801= max (0 120572

119894minus 120572119895)

1198811= min (120572

119894minus 120572119895minus (1 minus 120574)119862 120574119862)

(31)

Secondly suppose that the labels of examples are as

119910119894= +1 119910

119895= +1 (32)

Similarly similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198802= max (0 120572

119894+ 120572119895minus 120574119862)

1198812= min (120572

119894+ 120572119895 120574119862)

(33)

Thirdly suppose that the labels of examples are as

119910119894= minus1 119910

119895= minus1 (34)

Similarly the similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198803= max (0 minus120572

119894minus 120572119895minus (1 minus 120574)119862)

1198813= min (minus120572

119894minus 120572119895 (1 minus 120574)119862)

(35)

Finally suppose that the labels of examples are as

119910119894= minus1 119910

119895= +1 (36)

Similarly the similar stricter box constraints of Lagrangemultipliers 120572

119894 120572119895are obtained as follows

1198804= max (0 minus120572

119894minus 120572119895)

1198814= min (120574119862 minus 120572

119894minus 120572119895 (1 minus 120574) 119862)

(37)

For the simplification set

120572temp119894

= 120572119896119894

119894+ 119889119894 (38)

120572119896119894+1

119894is defined as the temp solution which would be edited

to satisfy

120572119896119894+1

119894=

119881119894 if 120572temp

119894gt 119881119894 119894 = 1 2 3 4

120572temp119894 if 119880

119894lt 120572

temp119894

lt 119881119894 119894 = 1 2 3 4

119880119894 if 120572temp

119894lt 119880119894 119894 = 1 2 3 4

(39)

From linear constraints with respect to 120572119896119894+1119894

in (17) thevalue of 120572119896119895+1

119894is obtained as

120572119896119895+1

119894= 119910119895(119910119894120572119896119894

119894+ 119910119895120572119896119895

119894minus 119910119894120572119896119894+1

119894) (40)

4 The Modified Proposed Method

41 Two Speeding-Up Strategies To avoid duplicate andinvalid iterative process calculations the two given condi-tions are not updated If one of two conditions ismet then thealgorithm skips this iteration that can significantly reduce thecomputational amount and accelerate the convergence speed

Condition 1 If 120572119896119894119894= 120572119896119895

119894= 0 or 120572119896119894

119894= 120572119896119895

119894= 119862 then the

constrained conditions are not updated

Abstract and Applied Analysis 5

Due to the box constraints of (10) there will appear a lotof boundary points of coordinates (120572119896119894

119894= 119862 or 120572119896119894

119894= 0) in

computing process If there are two coordinates (120572119896119894119894and 120572119896119894119894)

as the value of 0 or 119862 in an iteration process the analyticalsolution of the two subvariables updates coordinates withoutcalculating The reason is that the formula (17) guarantees120572119896119894

119894+ 120572119896119895

119894= 0 or 120572119896119894

119894+ 120572119896119895

119894= 2119862 while double restricted

box constrained optimization 0 le 120572119896119894119894le 119862 if the result is 0 or

119862 Constrained conditions will be edited ultimately as 0 or 119862The constrained conditions are not updated

Condition 2 If projected gradient is 0 the constrainedconditions are not updated

42 Dual Problem of Cost-Sensitive Support Vector MachineUsing Randomized Dual Coordinate Descent Method and ItsComplexity Analysis From the above algorithmderivation ofview solving of CSVM seems to have been successful but thecomputational complexity of the solving process is also largerAssume that the average value of nonzero feature of eachsample is 119899 Firstly the computational complexity of the innerproduct matrix 119876

119894119894= 119910119894119909119879

119894119909119894119910119894 119894 = 1 2 119897 is 119874(119897119899) but the

process can be operated in advance and stored into memorySecondly the calculation of (120572119896119894119876)

119894= sum119897

119904=1119876119894119904120572119896119904

119894takes the

computational complexity 119874(119897119899) The amount of calculationis very great when the data size is large However there is alinear relationship model CSVM

119908 =

119897

sum119904=1

120572119896119904

119894119909119904119910119904 (41)

Thus (120572119896119894119876)119894is further simplified as follows

(120572119896119894119876)119894=

119897

sum119904=1

119876119894119904120572119896119904

119894=

119897

sum119904=1

(120572119896119904

119894119910119904119909119879

119904) 119909119904119910119904= 119908119879119909119894119910119894 (42)

Solving the corresponding formula (23) can be simplifiedas follows

119889119894=2(120572119896119894119876)119894minus 1 minus (2(120572

119896119895119876)119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

=2119908119879119909119894119910119894minus 1 minus (2119908

119879119909119895119910119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

(43)

As can be seen computing119889119894with the complexity of119874(119897119899)

becomes computing 119908119879119909119894119910119894with the complexity of 119874(119899)

Thus the calculation times reduce 119897 minus 1 However

119908 =

119897

sum119904=1

120572119896119904

119894119909119904119910119904 (44)

where 119908 is the updated still computational complexity119874(119897119899)The amount of calculation can be reduced significantly when119908 is updated by changing 120572119896119904

119894 Let 120572119896119894

119894 120572119896119895119894

be the values of

the current selection 120572119896119894+1119894

120572119896119895+1119894

are updated values whichcan be updated via a simple way

119908 larr997888 119908 + (120572119896119894

119894minus 120572119896119894+1

119894) 119909119894119910119894+ (120572119896119895

119894minus 120572119896119895+1

119894) 119909119895119910119895 (45)

Its computational complexity only is 119874(119899) So whethercalculating 119889

119894or updating 119908 coordinate gradient compu-

tation complexity is 119874(119899) which is one of the coordinategradient method rapid convergence speed reasons

When assigned an initial value and constraints based on

119899

sum119894=1

120572119894119910119894= 0

0 le 120572119894le 120574119862

0 le 120572119895le (1 minus 120574)119862

(46)

set the initial point

1205720= [ 120572

119894 120572

119895 ]

119879

1times119897 (47)

where 120572119894= 1119905 120572

119895= (1(119897 minus 119905))119897 119905 are the total number

of samples and the number of positive samples respectivelyThus the weight vector 119908 of the original problem is obtainedby optimizing Lagrangian multipliers 120572 of the dual problem

43 Description of Cost-Sensitive Support Vector MachineUsing Randomized Dual Coordinate Descent Method Accel-erated conditions are judged by Section 41 One chooses thecoordinate optimized number 119894 119895 We use formula (43) tocalculate119889

119894 formulas (39) and (40) to update120572119896119894

119894120572119896119895119894 and the

formula (45) to update 119908 respectively We can see that inneriteration takes 119874(119899) effort The computer memory is mainlyused to store samples information 119909

1 1199092 119909

119897and each

sample point and their inner products 119876119894119894= 119910119894119909119879

119894119909119894119910119894 Cost-

sensitive support vector machine using dual randomizedcoordinate gradient descent algorithm is described as follows

Algorithm 3 Cost-sensitive support vector machine usingrandomized dual coordinate descent algorithm (CSVM-RDCD)

Input sample information (119909119894 119910119894)119897

119894=1 1205720

Output 120572119879119908

Initialize 1205720 and the corresponding 119908 = sum119897119894=1120572119894119909119894119910119894

For 119896 = 1 2 119879 do

1205721198961larr997888 120572

119896 (48)

Step 1 Randomly choose 119894 119895 isin 1 2 119897 119894 = 119895 and thecorresponding 119910

119894 119910119895

6 Abstract and Applied Analysis

Table 1 Specification of the benchmark datasets Number of ex is the number of example data points Number of feat is the number offeatures Ratio is the class imbalance ratio Target specifies the target or positive class Number is the number of the datasets

Dataset Number of ex Number of feat Ratio Positive class NumberKDD99 (intrusion detection) 5209460 42 1 4 Normal 1Web span 350000 254 1 2 minus1 2MNIST 70000 780 1 10 5 3Covertype 581012 54 1 211 CottonwoodWillow(4) 4SIAM1 28596 30438 1 2000 1 6 7 11 5SIAM11 28596 30438 1 716 11 12 6

Step 2

If (119910119894= +1 119910

119895= minus1)

1198801= max (0 120572

119894minus 120572119895)

1198811= min (120572

119894minus 120572119895minus (1 minus 120574)119862 120574119862)

if (119910119894= +1 119910

119895= +1)

1198802= max (0 120572

119894+ 120572119895minus 120574119862)

1198812= min (120572

119894+ 120572119895 120574119862)

if (119910119894= minus1 119910

119895= minus1)

1198803= max (0 minus120572

119894minus 120572119895minus (1 minus 120574)119862)

1198813= min (minus120572

119894minus 120572119895 (1 minus 120574) 119862)

if (119910119894= minus1 119910

119895= +1)

1198804= max (0 minus120572

119894minus 120572119895)

1198814= min (120574119862 minus 120572

119894minus 120572119895 (1 minus 120574) 119862)

(49)

Step 3

120572temp119894

= 120572119896119894

119894+2119908119879119909119894119910119894minus 1 minus (2119908

119879119909119895119910119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

(50)

Step 4

if (119910119894= 1 119910

119895= minus1)119898 = 1

if (119910119894= 1 119910

119895= 1)119898 = 2

if (119910119894= minus1 119910

119895= minus1)119898 = 3

119891 (119910119894= minus1 119910

119895= +1)119898 = 4

120572119896119894+1

119894=

119881119898 if 120572temp

119894gt 119881119898

120572temp119894 if 119880

119898lt 120572

temp119894

lt 119881119898

119880119898 if 120572temp

119894lt 119880119898

(51)

Step 5

120572119896119895+1

119894= 119910119895119910119894120572119896119894

119894+ 119910119895120572119896119895

119895minus 119910119894120572119896119894+1

119894 (52)

Step 6

119908 larr997888 119908 + (120572119896119894

119894minus 120572119896119894+1

119894) 119909119894119910119894+ (120572119896119895

119894minus 120572119896119895+1

119894) 119909119895119910119895 (53)

Until A stopping condition is satisfied

120572119896+1larr997888 120572

119896 (54)

End for

5 Experiments and Analysis

51 Experiment on Big Class-Imbalanced Benchmark DatasetsClassification Problem In this section we analyze the perfor-mance of the proposed cost-sensitive support vectormachineusing randomized dual coordinate descent method (CSVM-RDCD) We compare our implementation with the state-of-the-art cost-sensitive support vector Three implementationsof related cost-sensitive SVMs are compared We proposedthe cost-sensitive SVM using randomized dual coordinatedescent method (CSVM-RDCD) by modifying the LibSVM[8 9] source code Eitrich and Lang [10] proposed parallelcost-sensitive support vector machine (PCSVM) [6] pro-posed cost-sensitive support vector machines (CSSVM)

Table 1 lists the statistics of data sets KDD99 WebSpam Covertype MNIST SIAM1 and SIAM11 are obtainedfrom the following data website KDD99 is at httpkddicsuciedudatabaseskddcup99kddcup99html Web Spam isat httpwwwccgatecheduprojectsdoiWebbSpamCorpushtml Covertype is at httparchiveicsuciedumldatasetsCovertype MNIST is at httpyannlecuncomexdbmnistSIAM is at httpsc3nasagovdashlinkresources138

To evaluate the performance of CSVM-RDCD methodwe use a stratified selection to split each dataset to 910training and 110 testing We briefly describe each set belowFor each dataset we choose the class with the higher cost orfewer data points as the target or positive class All multiclassdatasets were converted to binary data sets In particular thebinary datasets SIAM1 and SIAM2 are datasets which havebeen constructed from the same multiclass dataset but withdifferent target class and different imbalance ratios

Abstract and Applied Analysis 7

Table 2 Cost-sensitive parameters of three algorithms using CSVMon 6 datasets

Dataset 119862120574 119862(1 minus 120574)

KDD99 (intrusion detection) 4 1Web span 2 1MNIST 10 1Covertype 211 1SIAM1 2000 1SIAM11 716 1

Evaluation of the performance of the three algorithmsusing CSVM was 10-fold cross-validation tested averagemisclassification cost the training time and the recognitionrate of the positive class (ie the recognition rate of theminority class)

Average misclassification cost (AMC) represents 10-foldcross-validation averagemisclassification cost on test datasetsfor related CSVMs described as follows

AMC =119891119901 times 119862120574 + 119891119899 times 119862 (1 minus 120574)

119873 (55)

where 119891119901 and 119891119899 represent the number of the positiveexamples misclassified as the negative examples in the testdataset and the number of the negative examplesmisclassifiedas the positive examples in the test data set respectively119862120574 and 119862(1 minus 120574) denote the cost of the positive examplesmisclassified as the negative examples in the test data setand the cost of the negative examples misclassified as thepositive examples in the test data set respectively119873 denotesthe number of test examples

The recognition rate of the positive class is the numberof classified positive classes and the number of the positiveclasses on testing dataset

Training time in seconds is used to evaluate the conver-gence speeding of three algorithms using CSVM on the samecomputer

Cost-sensitive parameters of three algorithms usingCSVM are specified as the following Table 2 Cost-sensitiveparameters 119862120574 and 119862(1 minus 120574) are valued according to the classratio of datasets namely the class ratio of minority class andmajority class

Three datasets with relative class imbalance are exam-ined Namely KDD99 (intrusion detection) Web span andMNIST datasets are considered Three datasets with severeclass imbalance are examined Namely Covertype SIAM1and SIAM11 datasets are considered The average misclassi-fication cost comparison of three algorithms using CSVMis shown in Figure 1 for each of the datasets The CSVM-RDCD algorithm outperforms the PCSVM and CSVM on alldatasets

The recognition rate of positive class comparison of threealgorithms using CSVM is shown in Figure 2 for each ofthe datasets The CSVM-RDCD algorithm outperforms thePCSVM and CSSVM on all datasets surpasses the PCSVMon four datasets and ties with the PCSVM on two datasets

We examine large datasets with relative imbalance ratiosand severe imbalance ratios to evaluate the convergence

1 2 3 4 5 6003

0035

004

0045

005

0055

006

0065

007

0075

AM

C

AMC

CSVM-RDCDPCSVMCSSVM

Datasets

Figure 1 Average misclassification cost comparison of three algo-rithms using CSVM on 6 datasets

1 2 3 4 5 60972

0974

0976

0978

098

0982

0984

0986

0988

099

0992

Datasets

Reco

gniti

on ra

te o

f pos

itive

clas

s

Recognition rate of positive class

CSVM-RDCDPCSVMCSSVM

Figure 2 The positive class recognition rate comparison of threealgorithms using CSVM

speed of CSVM-RDCD algorithm The training time com-parison of three algorithms using CSVM is shown in Figure 3for each of the datasetsTheCSVM-RDCD algorithm outper-forms the PCSVM and CSSVM on all datasets

52 Experiment on Real-World Big Class-Imbalanced DatasetClassification Problems In order to verify the effectivenessof the proposed algorithm CSVM-RDCD on real-worldbig class-imbalanced data classification problems it was

8 Abstract and Applied Analysis

1 2 3 4 5 60

5

10

15

20

25

30

Trai

ning

tim

e (s)

Training time

Datasets

CSVM-RDCDPCSVMCSSVM

Figure 3 Training time (s) comparison of three algorithms usingCSSVM

Table 3 Specification of the benchmark datasets and cost-sensitiveparameters on the dataset

Dataset Fault diagnosis of wind turbineNumber of ex 20000Number of feat 56Ratio 190Positive class Local crack fault119862120574 190119862(1 minus 120574) 1

evaluated using the real vibration data measured in the windturbine The experimental data were from the SKF Wind-Con software and collected from a wind turbine gearboxtype TF138-A [11] The vibration signals were continuouslyacquired by an accelerometer mounted on the outer case ofthe gearbox

All parameter settings for the dataset are listed in Table 3The statistical results of the big class-imbalanced data prob-lems that measure the quality of results (average misclassi-fication cost recognition rate of positive class and trainingtime) are listed in Table 4 From Table 4 it can be concludedthat CSVM-RDCD is able to consistently achieve superiorperformance in the big class-imbalanced data classificationproblems

Experimental results show that it is applicable to solvecost-sensitive SVM dual problem using randomized dualcoordinate descent method on the large-scale experimentaldata sets The proposed method can achieve superior perfor-mance in the average misclassification cost recognition rateof positive class and training time Large-scale experimentaldata sets show that cost-sensitive support vector machinesusing randomized dual coordinate descent method run

Table 4 Classification performance of three algorithms usingCSVM on the dataset

Name ofalgorithms AMC Recognition of

positive class Training time

CSVM-RDCD 001825 985 543 sPCSVM 00334 976 615 sCSSVM 00379 972 734 s

more efficiently than both PCSVM and CSSVM especiallyrandomized dual coordinate descent algorithmhas advantageof training time on large-scale data sets CSSVM needs tobuild complex whole gradient and kernel matrix 119876 andneeds to select the set of complex work in solving process ofdecomposition algorithm Decomposition algorithm updatesfull uniform gradient information as a whole the computa-tional complexity 119874(119897119899) for the full gradient update PCSVMalso has similar computational complexity Randomized dualcoordinate gradient method updates linearly the coordinatesits computational complexity as 119874(119899) which increases con-siderably the convergence speed of the proposed method

6 Conclusions

Randomized dual coordinate descentmethod (RDCD) is theoptimization algorithm to update the global solution which isobtained by solving an analytical solution of the suboptimalproblemThe RDCDmethod has the rapid convergence ratewhich is mainly due to the following (1) the subproblemhas formal analytical solution which is solved in solvingprocess without complex numerical optimization (2) thenext component of RDCD method in solving process isupdated on the basis of a previous component comparedwith the full gradient information updated CSSVM methodas a whole the objective function of RDCD method candecline faster (3) the single coordinate gradient calculationof RDCDmethod is simpler and easier than the full gradientcalculation

Randomized dual coordinate descent method is appliedto cost-sensitive support vector machine which expandedthe scope of application of the randomized dual coordinatedescent method For large-scale class-imbalanced problema cost-sensitive SVM using randomized dual coordinatedescent method is proposed Experimental results and anal-ysis show the effectiveness and feasibility of the proposedmethod

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the National ScienceFund for Distinguished Young Scholars of China (no61025015) the National Natural Science Foundation of China

Abstract and Applied Analysis 9

(nos 51305046 and 61304019) the Research Foundation ofEducation Bureau of Hunan Province China (no 12A007)Open Fund of Hunan Province University Key Laboratoryof Bridge Engineering (Changsha University of Scienceand Technology) the Key Laboratory of Renewable EnergyElectric-Technology of Hunan Province (Changsha Univer-sity of Science and Technology) the Key Laboratory ofEfficient and Clean Energy Utilization College of HunanProvince and the Introduction of Talent Fund of ChangshaUniversity of Science and Technology

References

[1] M A Davenport ldquoThe 2nu-SVM a cost-sensitive extension ofthe nu-SVMrdquo Tech Rep TREE 0504 Department of Electricaland Computer Engineering Rice University 2005

[2] M Kim ldquoLarge margin cost-sensitive learning of conditionalrandom fieldsrdquo Pattern Recognition vol 43 no 10 pp 3683ndash3692 2010

[3] Y-J Park S-H Chun and B-C Kim ldquoCost-sensitive case-based reasoning using a genetic algorithm application tomedical diagnosisrdquoArtificial Intelligence inMedicine vol 51 no2 pp 133ndash145 2011

[4] J Kim K Choi G Kim and Y Suh ldquoClassification costan empirical comparison among traditional classifier Cost-Sensitive Classifier and MetaCostrdquo Expert Systems with Appli-cations vol 39 no 4 pp 4013ndash4019 2012

[5] C-Y Yang J-S Yang and J-J Wang ldquoMargin calibration inSVM class-imbalanced learningrdquo Neurocomputing vol 73 no1ndash3 pp 397ndash411 2009

[6] H Masnadi-Shirazi N Vasconcelos and A Iranmehr ldquoCost-sensitive supportvector machinesrdquo httparxivorgabs12120975

[7] Y Artan M A Haider D L Langer et al ldquoProstate cancerlocalization with multispectral MRI using cost-sensitive sup-port vector machines and conditional random fieldsrdquo IEEETransactions on Image Processing vol 19 no 9 pp 2444ndash24552010

[8] C-J Hsieh K-W Chang C-J Lin S S Keerthi and SSundararajan ldquoA dual coordinate descent method for large-scale linear SVMrdquo in Proceedings of the 25th InternationalConference on Machine Learning pp 408ndash415 July 2008

[9] R-E Fan K-W Chang C-J Hsieh X-R Wang and C-JLin ldquoLIBLINEAR a library for large linear classificationrdquo TheJournal of Machine Learning Research vol 9 pp 1871ndash18742008

[10] T Eitrich and B Lang ldquoParallel cost-sensitive support vectormachine software for classificationrdquo in Proceedings of the Work-shop from Computational Biophysics to Systems Biology NICSeries pp 141ndash144 John vonNeumann Institute forComputingJulich Germany 2006

[11] B Tang W Liu and T Song ldquoWind turbine fault diagnosisbased on Morlet wavelet transformation and Wigner-Villedistributionrdquo Renewable Energy vol 35 no 12 pp 2862ndash28662010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Cost-Sensitive Support Vector Machine Using …downloads.hindawi.com/journals/aaa/2014/416591.pdf · 2019-07-31 · Research Article Cost-Sensitive Support Vector

Abstract and Applied Analysis 5

Due to the box constraints of (10) there will appear a lotof boundary points of coordinates (120572119896119894

119894= 119862 or 120572119896119894

119894= 0) in

computing process If there are two coordinates (120572119896119894119894and 120572119896119894119894)

as the value of 0 or 119862 in an iteration process the analyticalsolution of the two subvariables updates coordinates withoutcalculating The reason is that the formula (17) guarantees120572119896119894

119894+ 120572119896119895

119894= 0 or 120572119896119894

119894+ 120572119896119895

119894= 2119862 while double restricted

box constrained optimization 0 le 120572119896119894119894le 119862 if the result is 0 or

119862 Constrained conditions will be edited ultimately as 0 or 119862The constrained conditions are not updated

Condition 2 If projected gradient is 0 the constrainedconditions are not updated

42 Dual Problem of Cost-Sensitive Support Vector MachineUsing Randomized Dual Coordinate Descent Method and ItsComplexity Analysis From the above algorithmderivation ofview solving of CSVM seems to have been successful but thecomputational complexity of the solving process is also largerAssume that the average value of nonzero feature of eachsample is 119899 Firstly the computational complexity of the innerproduct matrix 119876

119894119894= 119910119894119909119879

119894119909119894119910119894 119894 = 1 2 119897 is 119874(119897119899) but the

process can be operated in advance and stored into memorySecondly the calculation of (120572119896119894119876)

119894= sum119897

119904=1119876119894119904120572119896119904

119894takes the

computational complexity 119874(119897119899) The amount of calculationis very great when the data size is large However there is alinear relationship model CSVM

119908 =

119897

sum119904=1

120572119896119904

119894119909119904119910119904 (41)

Thus (120572119896119894119876)119894is further simplified as follows

(120572119896119894119876)119894=

119897

sum119904=1

119876119894119904120572119896119904

119894=

119897

sum119904=1

(120572119896119904

119894119910119904119909119879

119904) 119909119904119910119904= 119908119879119909119894119910119894 (42)

Solving the corresponding formula (23) can be simplifiedas follows

119889119894=2(120572119896119894119876)119894minus 1 minus (2(120572

119896119895119876)119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

=2119908119879119909119894119910119894minus 1 minus (2119908

119879119909119895119910119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

(43)

As can be seen computing119889119894with the complexity of119874(119897119899)

becomes computing 119908119879119909119894119910119894with the complexity of 119874(119899)

Thus the calculation times reduce 119897 minus 1 However

119908 =

119897

sum119904=1

120572119896119904

119894119909119904119910119904 (44)

where 119908 is the updated still computational complexity119874(119897119899)The amount of calculation can be reduced significantly when119908 is updated by changing 120572119896119904

119894 Let 120572119896119894

119894 120572119896119895119894

be the values of

the current selection 120572119896119894+1119894

120572119896119895+1119894

are updated values whichcan be updated via a simple way

119908 larr997888 119908 + (120572119896119894

119894minus 120572119896119894+1

119894) 119909119894119910119894+ (120572119896119895

119894minus 120572119896119895+1

119894) 119909119895119910119895 (45)

Its computational complexity only is 119874(119899) So whethercalculating 119889

119894or updating 119908 coordinate gradient compu-

tation complexity is 119874(119899) which is one of the coordinategradient method rapid convergence speed reasons

When assigned an initial value and constraints based on

119899

sum119894=1

120572119894119910119894= 0

0 le 120572119894le 120574119862

0 le 120572119895le (1 minus 120574)119862

(46)

set the initial point

1205720= [ 120572

119894 120572

119895 ]

119879

1times119897 (47)

where 120572119894= 1119905 120572

119895= (1(119897 minus 119905))119897 119905 are the total number

of samples and the number of positive samples respectivelyThus the weight vector 119908 of the original problem is obtainedby optimizing Lagrangian multipliers 120572 of the dual problem

43 Description of Cost-Sensitive Support Vector MachineUsing Randomized Dual Coordinate Descent Method Accel-erated conditions are judged by Section 41 One chooses thecoordinate optimized number 119894 119895 We use formula (43) tocalculate119889

119894 formulas (39) and (40) to update120572119896119894

119894120572119896119895119894 and the

formula (45) to update 119908 respectively We can see that inneriteration takes 119874(119899) effort The computer memory is mainlyused to store samples information 119909

1 1199092 119909

119897and each

sample point and their inner products 119876119894119894= 119910119894119909119879

119894119909119894119910119894 Cost-

sensitive support vector machine using dual randomizedcoordinate gradient descent algorithm is described as follows

Algorithm 3 Cost-sensitive support vector machine usingrandomized dual coordinate descent algorithm (CSVM-RDCD)

Input sample information (119909119894 119910119894)119897

119894=1 1205720

Output 120572119879119908

Initialize 1205720 and the corresponding 119908 = sum119897119894=1120572119894119909119894119910119894

For 119896 = 1 2 119879 do

1205721198961larr997888 120572

119896 (48)

Step 1 Randomly choose 119894 119895 isin 1 2 119897 119894 = 119895 and thecorresponding 119910

119894 119910119895

6 Abstract and Applied Analysis

Table 1 Specification of the benchmark datasets Number of ex is the number of example data points Number of feat is the number offeatures Ratio is the class imbalance ratio Target specifies the target or positive class Number is the number of the datasets

Dataset Number of ex Number of feat Ratio Positive class NumberKDD99 (intrusion detection) 5209460 42 1 4 Normal 1Web span 350000 254 1 2 minus1 2MNIST 70000 780 1 10 5 3Covertype 581012 54 1 211 CottonwoodWillow(4) 4SIAM1 28596 30438 1 2000 1 6 7 11 5SIAM11 28596 30438 1 716 11 12 6

Step 2

If (119910119894= +1 119910

119895= minus1)

1198801= max (0 120572

119894minus 120572119895)

1198811= min (120572

119894minus 120572119895minus (1 minus 120574)119862 120574119862)

if (119910119894= +1 119910

119895= +1)

1198802= max (0 120572

119894+ 120572119895minus 120574119862)

1198812= min (120572

119894+ 120572119895 120574119862)

if (119910119894= minus1 119910

119895= minus1)

1198803= max (0 minus120572

119894minus 120572119895minus (1 minus 120574)119862)

1198813= min (minus120572

119894minus 120572119895 (1 minus 120574) 119862)

if (119910119894= minus1 119910

119895= +1)

1198804= max (0 minus120572

119894minus 120572119895)

1198814= min (120574119862 minus 120572

119894minus 120572119895 (1 minus 120574) 119862)

(49)

Step 3

120572temp119894

= 120572119896119894

119894+2119908119879119909119894119910119894minus 1 minus (2119908

119879119909119895119910119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

(50)

Step 4

if (119910119894= 1 119910

119895= minus1)119898 = 1

if (119910119894= 1 119910

119895= 1)119898 = 2

if (119910119894= minus1 119910

119895= minus1)119898 = 3

119891 (119910119894= minus1 119910

119895= +1)119898 = 4

120572119896119894+1

119894=

119881119898 if 120572temp

119894gt 119881119898

120572temp119894 if 119880

119898lt 120572

temp119894

lt 119881119898

119880119898 if 120572temp

119894lt 119880119898

(51)

Step 5

120572119896119895+1

119894= 119910119895119910119894120572119896119894

119894+ 119910119895120572119896119895

119895minus 119910119894120572119896119894+1

119894 (52)

Step 6

119908 larr997888 119908 + (120572119896119894

119894minus 120572119896119894+1

119894) 119909119894119910119894+ (120572119896119895

119894minus 120572119896119895+1

119894) 119909119895119910119895 (53)

Until A stopping condition is satisfied

120572119896+1larr997888 120572

119896 (54)

End for

5 Experiments and Analysis

51 Experiment on Big Class-Imbalanced Benchmark DatasetsClassification Problem In this section we analyze the perfor-mance of the proposed cost-sensitive support vectormachineusing randomized dual coordinate descent method (CSVM-RDCD) We compare our implementation with the state-of-the-art cost-sensitive support vector Three implementationsof related cost-sensitive SVMs are compared We proposedthe cost-sensitive SVM using randomized dual coordinatedescent method (CSVM-RDCD) by modifying the LibSVM[8 9] source code Eitrich and Lang [10] proposed parallelcost-sensitive support vector machine (PCSVM) [6] pro-posed cost-sensitive support vector machines (CSSVM)

Table 1 lists the statistics of data sets KDD99 WebSpam Covertype MNIST SIAM1 and SIAM11 are obtainedfrom the following data website KDD99 is at httpkddicsuciedudatabaseskddcup99kddcup99html Web Spam isat httpwwwccgatecheduprojectsdoiWebbSpamCorpushtml Covertype is at httparchiveicsuciedumldatasetsCovertype MNIST is at httpyannlecuncomexdbmnistSIAM is at httpsc3nasagovdashlinkresources138

To evaluate the performance of CSVM-RDCD methodwe use a stratified selection to split each dataset to 910training and 110 testing We briefly describe each set belowFor each dataset we choose the class with the higher cost orfewer data points as the target or positive class All multiclassdatasets were converted to binary data sets In particular thebinary datasets SIAM1 and SIAM2 are datasets which havebeen constructed from the same multiclass dataset but withdifferent target class and different imbalance ratios

Abstract and Applied Analysis 7

Table 2 Cost-sensitive parameters of three algorithms using CSVMon 6 datasets

Dataset 119862120574 119862(1 minus 120574)

KDD99 (intrusion detection) 4 1Web span 2 1MNIST 10 1Covertype 211 1SIAM1 2000 1SIAM11 716 1

Evaluation of the performance of the three algorithmsusing CSVM was 10-fold cross-validation tested averagemisclassification cost the training time and the recognitionrate of the positive class (ie the recognition rate of theminority class)

Average misclassification cost (AMC) represents 10-foldcross-validation averagemisclassification cost on test datasetsfor related CSVMs described as follows

AMC =119891119901 times 119862120574 + 119891119899 times 119862 (1 minus 120574)

119873 (55)

where 119891119901 and 119891119899 represent the number of the positiveexamples misclassified as the negative examples in the testdataset and the number of the negative examplesmisclassifiedas the positive examples in the test data set respectively119862120574 and 119862(1 minus 120574) denote the cost of the positive examplesmisclassified as the negative examples in the test data setand the cost of the negative examples misclassified as thepositive examples in the test data set respectively119873 denotesthe number of test examples

The recognition rate of the positive class is the numberof classified positive classes and the number of the positiveclasses on testing dataset

Training time in seconds is used to evaluate the conver-gence speeding of three algorithms using CSVM on the samecomputer

Cost-sensitive parameters of three algorithms usingCSVM are specified as the following Table 2 Cost-sensitiveparameters 119862120574 and 119862(1 minus 120574) are valued according to the classratio of datasets namely the class ratio of minority class andmajority class

Three datasets with relative class imbalance are exam-ined Namely KDD99 (intrusion detection) Web span andMNIST datasets are considered Three datasets with severeclass imbalance are examined Namely Covertype SIAM1and SIAM11 datasets are considered The average misclassi-fication cost comparison of three algorithms using CSVMis shown in Figure 1 for each of the datasets The CSVM-RDCD algorithm outperforms the PCSVM and CSVM on alldatasets

The recognition rate of positive class comparison of threealgorithms using CSVM is shown in Figure 2 for each ofthe datasets The CSVM-RDCD algorithm outperforms thePCSVM and CSSVM on all datasets surpasses the PCSVMon four datasets and ties with the PCSVM on two datasets

We examine large datasets with relative imbalance ratiosand severe imbalance ratios to evaluate the convergence

1 2 3 4 5 6003

0035

004

0045

005

0055

006

0065

007

0075

AM

C

AMC

CSVM-RDCDPCSVMCSSVM

Datasets

Figure 1 Average misclassification cost comparison of three algo-rithms using CSVM on 6 datasets

1 2 3 4 5 60972

0974

0976

0978

098

0982

0984

0986

0988

099

0992

Datasets

Reco

gniti

on ra

te o

f pos

itive

clas

s

Recognition rate of positive class

CSVM-RDCDPCSVMCSSVM

Figure 2 The positive class recognition rate comparison of threealgorithms using CSVM

speed of CSVM-RDCD algorithm The training time com-parison of three algorithms using CSVM is shown in Figure 3for each of the datasetsTheCSVM-RDCD algorithm outper-forms the PCSVM and CSSVM on all datasets

52 Experiment on Real-World Big Class-Imbalanced DatasetClassification Problems In order to verify the effectivenessof the proposed algorithm CSVM-RDCD on real-worldbig class-imbalanced data classification problems it was

8 Abstract and Applied Analysis

1 2 3 4 5 60

5

10

15

20

25

30

Trai

ning

tim

e (s)

Training time

Datasets

CSVM-RDCDPCSVMCSSVM

Figure 3 Training time (s) comparison of three algorithms usingCSSVM

Table 3 Specification of the benchmark datasets and cost-sensitiveparameters on the dataset

Dataset Fault diagnosis of wind turbineNumber of ex 20000Number of feat 56Ratio 190Positive class Local crack fault119862120574 190119862(1 minus 120574) 1

evaluated using the real vibration data measured in the windturbine The experimental data were from the SKF Wind-Con software and collected from a wind turbine gearboxtype TF138-A [11] The vibration signals were continuouslyacquired by an accelerometer mounted on the outer case ofthe gearbox

All parameter settings for the dataset are listed in Table 3The statistical results of the big class-imbalanced data prob-lems that measure the quality of results (average misclassi-fication cost recognition rate of positive class and trainingtime) are listed in Table 4 From Table 4 it can be concludedthat CSVM-RDCD is able to consistently achieve superiorperformance in the big class-imbalanced data classificationproblems

Experimental results show that it is applicable to solvecost-sensitive SVM dual problem using randomized dualcoordinate descent method on the large-scale experimentaldata sets The proposed method can achieve superior perfor-mance in the average misclassification cost recognition rateof positive class and training time Large-scale experimentaldata sets show that cost-sensitive support vector machinesusing randomized dual coordinate descent method run

Table 4 Classification performance of three algorithms usingCSVM on the dataset

Name ofalgorithms AMC Recognition of

positive class Training time

CSVM-RDCD 001825 985 543 sPCSVM 00334 976 615 sCSSVM 00379 972 734 s

more efficiently than both PCSVM and CSSVM especiallyrandomized dual coordinate descent algorithmhas advantageof training time on large-scale data sets CSSVM needs tobuild complex whole gradient and kernel matrix 119876 andneeds to select the set of complex work in solving process ofdecomposition algorithm Decomposition algorithm updatesfull uniform gradient information as a whole the computa-tional complexity 119874(119897119899) for the full gradient update PCSVMalso has similar computational complexity Randomized dualcoordinate gradient method updates linearly the coordinatesits computational complexity as 119874(119899) which increases con-siderably the convergence speed of the proposed method

6 Conclusions

Randomized dual coordinate descentmethod (RDCD) is theoptimization algorithm to update the global solution which isobtained by solving an analytical solution of the suboptimalproblemThe RDCDmethod has the rapid convergence ratewhich is mainly due to the following (1) the subproblemhas formal analytical solution which is solved in solvingprocess without complex numerical optimization (2) thenext component of RDCD method in solving process isupdated on the basis of a previous component comparedwith the full gradient information updated CSSVM methodas a whole the objective function of RDCD method candecline faster (3) the single coordinate gradient calculationof RDCDmethod is simpler and easier than the full gradientcalculation

Randomized dual coordinate descent method is appliedto cost-sensitive support vector machine which expandedthe scope of application of the randomized dual coordinatedescent method For large-scale class-imbalanced problema cost-sensitive SVM using randomized dual coordinatedescent method is proposed Experimental results and anal-ysis show the effectiveness and feasibility of the proposedmethod

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the National ScienceFund for Distinguished Young Scholars of China (no61025015) the National Natural Science Foundation of China

Abstract and Applied Analysis 9

(nos 51305046 and 61304019) the Research Foundation ofEducation Bureau of Hunan Province China (no 12A007)Open Fund of Hunan Province University Key Laboratoryof Bridge Engineering (Changsha University of Scienceand Technology) the Key Laboratory of Renewable EnergyElectric-Technology of Hunan Province (Changsha Univer-sity of Science and Technology) the Key Laboratory ofEfficient and Clean Energy Utilization College of HunanProvince and the Introduction of Talent Fund of ChangshaUniversity of Science and Technology

References

[1] M A Davenport ldquoThe 2nu-SVM a cost-sensitive extension ofthe nu-SVMrdquo Tech Rep TREE 0504 Department of Electricaland Computer Engineering Rice University 2005

[2] M Kim ldquoLarge margin cost-sensitive learning of conditionalrandom fieldsrdquo Pattern Recognition vol 43 no 10 pp 3683ndash3692 2010

[3] Y-J Park S-H Chun and B-C Kim ldquoCost-sensitive case-based reasoning using a genetic algorithm application tomedical diagnosisrdquoArtificial Intelligence inMedicine vol 51 no2 pp 133ndash145 2011

[4] J Kim K Choi G Kim and Y Suh ldquoClassification costan empirical comparison among traditional classifier Cost-Sensitive Classifier and MetaCostrdquo Expert Systems with Appli-cations vol 39 no 4 pp 4013ndash4019 2012

[5] C-Y Yang J-S Yang and J-J Wang ldquoMargin calibration inSVM class-imbalanced learningrdquo Neurocomputing vol 73 no1ndash3 pp 397ndash411 2009

[6] H Masnadi-Shirazi N Vasconcelos and A Iranmehr ldquoCost-sensitive supportvector machinesrdquo httparxivorgabs12120975

[7] Y Artan M A Haider D L Langer et al ldquoProstate cancerlocalization with multispectral MRI using cost-sensitive sup-port vector machines and conditional random fieldsrdquo IEEETransactions on Image Processing vol 19 no 9 pp 2444ndash24552010

[8] C-J Hsieh K-W Chang C-J Lin S S Keerthi and SSundararajan ldquoA dual coordinate descent method for large-scale linear SVMrdquo in Proceedings of the 25th InternationalConference on Machine Learning pp 408ndash415 July 2008

[9] R-E Fan K-W Chang C-J Hsieh X-R Wang and C-JLin ldquoLIBLINEAR a library for large linear classificationrdquo TheJournal of Machine Learning Research vol 9 pp 1871ndash18742008

[10] T Eitrich and B Lang ldquoParallel cost-sensitive support vectormachine software for classificationrdquo in Proceedings of the Work-shop from Computational Biophysics to Systems Biology NICSeries pp 141ndash144 John vonNeumann Institute forComputingJulich Germany 2006

[11] B Tang W Liu and T Song ldquoWind turbine fault diagnosisbased on Morlet wavelet transformation and Wigner-Villedistributionrdquo Renewable Energy vol 35 no 12 pp 2862ndash28662010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Cost-Sensitive Support Vector Machine Using …downloads.hindawi.com/journals/aaa/2014/416591.pdf · 2019-07-31 · Research Article Cost-Sensitive Support Vector

6 Abstract and Applied Analysis

Table 1 Specification of the benchmark datasets Number of ex is the number of example data points Number of feat is the number offeatures Ratio is the class imbalance ratio Target specifies the target or positive class Number is the number of the datasets

Dataset Number of ex Number of feat Ratio Positive class NumberKDD99 (intrusion detection) 5209460 42 1 4 Normal 1Web span 350000 254 1 2 minus1 2MNIST 70000 780 1 10 5 3Covertype 581012 54 1 211 CottonwoodWillow(4) 4SIAM1 28596 30438 1 2000 1 6 7 11 5SIAM11 28596 30438 1 716 11 12 6

Step 2

If (119910119894= +1 119910

119895= minus1)

1198801= max (0 120572

119894minus 120572119895)

1198811= min (120572

119894minus 120572119895minus (1 minus 120574)119862 120574119862)

if (119910119894= +1 119910

119895= +1)

1198802= max (0 120572

119894+ 120572119895minus 120574119862)

1198812= min (120572

119894+ 120572119895 120574119862)

if (119910119894= minus1 119910

119895= minus1)

1198803= max (0 minus120572

119894minus 120572119895minus (1 minus 120574)119862)

1198813= min (minus120572

119894minus 120572119895 (1 minus 120574) 119862)

if (119910119894= minus1 119910

119895= +1)

1198804= max (0 minus120572

119894minus 120572119895)

1198814= min (120574119862 minus 120572

119894minus 120572119895 (1 minus 120574) 119862)

(49)

Step 3

120572temp119894

= 120572119896119894

119894+2119908119879119909119894119910119894minus 1 minus (2119908

119879119909119895119910119895minus 1) 119910

119894119910119895

2 (119876119894119894+ 119876119895119895)

(50)

Step 4

if (119910119894= 1 119910

119895= minus1)119898 = 1

if (119910119894= 1 119910

119895= 1)119898 = 2

if (119910119894= minus1 119910

119895= minus1)119898 = 3

119891 (119910119894= minus1 119910

119895= +1)119898 = 4

120572119896119894+1

119894=

119881119898 if 120572temp

119894gt 119881119898

120572temp119894 if 119880

119898lt 120572

temp119894

lt 119881119898

119880119898 if 120572temp

119894lt 119880119898

(51)

Step 5

120572119896119895+1

119894= 119910119895119910119894120572119896119894

119894+ 119910119895120572119896119895

119895minus 119910119894120572119896119894+1

119894 (52)

Step 6

119908 larr997888 119908 + (120572119896119894

119894minus 120572119896119894+1

119894) 119909119894119910119894+ (120572119896119895

119894minus 120572119896119895+1

119894) 119909119895119910119895 (53)

Until A stopping condition is satisfied

120572119896+1larr997888 120572

119896 (54)

End for

5 Experiments and Analysis

51 Experiment on Big Class-Imbalanced Benchmark DatasetsClassification Problem In this section we analyze the perfor-mance of the proposed cost-sensitive support vectormachineusing randomized dual coordinate descent method (CSVM-RDCD) We compare our implementation with the state-of-the-art cost-sensitive support vector Three implementationsof related cost-sensitive SVMs are compared We proposedthe cost-sensitive SVM using randomized dual coordinatedescent method (CSVM-RDCD) by modifying the LibSVM[8 9] source code Eitrich and Lang [10] proposed parallelcost-sensitive support vector machine (PCSVM) [6] pro-posed cost-sensitive support vector machines (CSSVM)

Table 1 lists the statistics of data sets KDD99 WebSpam Covertype MNIST SIAM1 and SIAM11 are obtainedfrom the following data website KDD99 is at httpkddicsuciedudatabaseskddcup99kddcup99html Web Spam isat httpwwwccgatecheduprojectsdoiWebbSpamCorpushtml Covertype is at httparchiveicsuciedumldatasetsCovertype MNIST is at httpyannlecuncomexdbmnistSIAM is at httpsc3nasagovdashlinkresources138

To evaluate the performance of CSVM-RDCD methodwe use a stratified selection to split each dataset to 910training and 110 testing We briefly describe each set belowFor each dataset we choose the class with the higher cost orfewer data points as the target or positive class All multiclassdatasets were converted to binary data sets In particular thebinary datasets SIAM1 and SIAM2 are datasets which havebeen constructed from the same multiclass dataset but withdifferent target class and different imbalance ratios

Abstract and Applied Analysis 7

Table 2 Cost-sensitive parameters of three algorithms using CSVMon 6 datasets

Dataset 119862120574 119862(1 minus 120574)

KDD99 (intrusion detection) 4 1Web span 2 1MNIST 10 1Covertype 211 1SIAM1 2000 1SIAM11 716 1

Evaluation of the performance of the three algorithmsusing CSVM was 10-fold cross-validation tested averagemisclassification cost the training time and the recognitionrate of the positive class (ie the recognition rate of theminority class)

Average misclassification cost (AMC) represents 10-foldcross-validation averagemisclassification cost on test datasetsfor related CSVMs described as follows

AMC =119891119901 times 119862120574 + 119891119899 times 119862 (1 minus 120574)

119873 (55)

where 119891119901 and 119891119899 represent the number of the positiveexamples misclassified as the negative examples in the testdataset and the number of the negative examplesmisclassifiedas the positive examples in the test data set respectively119862120574 and 119862(1 minus 120574) denote the cost of the positive examplesmisclassified as the negative examples in the test data setand the cost of the negative examples misclassified as thepositive examples in the test data set respectively119873 denotesthe number of test examples

The recognition rate of the positive class is the numberof classified positive classes and the number of the positiveclasses on testing dataset

Training time in seconds is used to evaluate the conver-gence speeding of three algorithms using CSVM on the samecomputer

Cost-sensitive parameters of three algorithms usingCSVM are specified as the following Table 2 Cost-sensitiveparameters 119862120574 and 119862(1 minus 120574) are valued according to the classratio of datasets namely the class ratio of minority class andmajority class

Three datasets with relative class imbalance are exam-ined Namely KDD99 (intrusion detection) Web span andMNIST datasets are considered Three datasets with severeclass imbalance are examined Namely Covertype SIAM1and SIAM11 datasets are considered The average misclassi-fication cost comparison of three algorithms using CSVMis shown in Figure 1 for each of the datasets The CSVM-RDCD algorithm outperforms the PCSVM and CSVM on alldatasets

The recognition rate of positive class comparison of threealgorithms using CSVM is shown in Figure 2 for each ofthe datasets The CSVM-RDCD algorithm outperforms thePCSVM and CSSVM on all datasets surpasses the PCSVMon four datasets and ties with the PCSVM on two datasets

We examine large datasets with relative imbalance ratiosand severe imbalance ratios to evaluate the convergence

1 2 3 4 5 6003

0035

004

0045

005

0055

006

0065

007

0075

AM

C

AMC

CSVM-RDCDPCSVMCSSVM

Datasets

Figure 1 Average misclassification cost comparison of three algo-rithms using CSVM on 6 datasets

1 2 3 4 5 60972

0974

0976

0978

098

0982

0984

0986

0988

099

0992

Datasets

Reco

gniti

on ra

te o

f pos

itive

clas

s

Recognition rate of positive class

CSVM-RDCDPCSVMCSSVM

Figure 2 The positive class recognition rate comparison of threealgorithms using CSVM

speed of CSVM-RDCD algorithm The training time com-parison of three algorithms using CSVM is shown in Figure 3for each of the datasetsTheCSVM-RDCD algorithm outper-forms the PCSVM and CSSVM on all datasets

52 Experiment on Real-World Big Class-Imbalanced DatasetClassification Problems In order to verify the effectivenessof the proposed algorithm CSVM-RDCD on real-worldbig class-imbalanced data classification problems it was

8 Abstract and Applied Analysis

1 2 3 4 5 60

5

10

15

20

25

30

Trai

ning

tim

e (s)

Training time

Datasets

CSVM-RDCDPCSVMCSSVM

Figure 3 Training time (s) comparison of three algorithms usingCSSVM

Table 3 Specification of the benchmark datasets and cost-sensitiveparameters on the dataset

Dataset Fault diagnosis of wind turbineNumber of ex 20000Number of feat 56Ratio 190Positive class Local crack fault119862120574 190119862(1 minus 120574) 1

evaluated using the real vibration data measured in the windturbine The experimental data were from the SKF Wind-Con software and collected from a wind turbine gearboxtype TF138-A [11] The vibration signals were continuouslyacquired by an accelerometer mounted on the outer case ofthe gearbox

All parameter settings for the dataset are listed in Table 3The statistical results of the big class-imbalanced data prob-lems that measure the quality of results (average misclassi-fication cost recognition rate of positive class and trainingtime) are listed in Table 4 From Table 4 it can be concludedthat CSVM-RDCD is able to consistently achieve superiorperformance in the big class-imbalanced data classificationproblems

Experimental results show that it is applicable to solvecost-sensitive SVM dual problem using randomized dualcoordinate descent method on the large-scale experimentaldata sets The proposed method can achieve superior perfor-mance in the average misclassification cost recognition rateof positive class and training time Large-scale experimentaldata sets show that cost-sensitive support vector machinesusing randomized dual coordinate descent method run

Table 4 Classification performance of three algorithms usingCSVM on the dataset

Name ofalgorithms AMC Recognition of

positive class Training time

CSVM-RDCD 001825 985 543 sPCSVM 00334 976 615 sCSSVM 00379 972 734 s

more efficiently than both PCSVM and CSSVM especiallyrandomized dual coordinate descent algorithmhas advantageof training time on large-scale data sets CSSVM needs tobuild complex whole gradient and kernel matrix 119876 andneeds to select the set of complex work in solving process ofdecomposition algorithm Decomposition algorithm updatesfull uniform gradient information as a whole the computa-tional complexity 119874(119897119899) for the full gradient update PCSVMalso has similar computational complexity Randomized dualcoordinate gradient method updates linearly the coordinatesits computational complexity as 119874(119899) which increases con-siderably the convergence speed of the proposed method

6 Conclusions

Randomized dual coordinate descentmethod (RDCD) is theoptimization algorithm to update the global solution which isobtained by solving an analytical solution of the suboptimalproblemThe RDCDmethod has the rapid convergence ratewhich is mainly due to the following (1) the subproblemhas formal analytical solution which is solved in solvingprocess without complex numerical optimization (2) thenext component of RDCD method in solving process isupdated on the basis of a previous component comparedwith the full gradient information updated CSSVM methodas a whole the objective function of RDCD method candecline faster (3) the single coordinate gradient calculationof RDCDmethod is simpler and easier than the full gradientcalculation

Randomized dual coordinate descent method is appliedto cost-sensitive support vector machine which expandedthe scope of application of the randomized dual coordinatedescent method For large-scale class-imbalanced problema cost-sensitive SVM using randomized dual coordinatedescent method is proposed Experimental results and anal-ysis show the effectiveness and feasibility of the proposedmethod

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the National ScienceFund for Distinguished Young Scholars of China (no61025015) the National Natural Science Foundation of China

Abstract and Applied Analysis 9

(nos 51305046 and 61304019) the Research Foundation ofEducation Bureau of Hunan Province China (no 12A007)Open Fund of Hunan Province University Key Laboratoryof Bridge Engineering (Changsha University of Scienceand Technology) the Key Laboratory of Renewable EnergyElectric-Technology of Hunan Province (Changsha Univer-sity of Science and Technology) the Key Laboratory ofEfficient and Clean Energy Utilization College of HunanProvince and the Introduction of Talent Fund of ChangshaUniversity of Science and Technology

References

[1] M A Davenport ldquoThe 2nu-SVM a cost-sensitive extension ofthe nu-SVMrdquo Tech Rep TREE 0504 Department of Electricaland Computer Engineering Rice University 2005

[2] M Kim ldquoLarge margin cost-sensitive learning of conditionalrandom fieldsrdquo Pattern Recognition vol 43 no 10 pp 3683ndash3692 2010

[3] Y-J Park S-H Chun and B-C Kim ldquoCost-sensitive case-based reasoning using a genetic algorithm application tomedical diagnosisrdquoArtificial Intelligence inMedicine vol 51 no2 pp 133ndash145 2011

[4] J Kim K Choi G Kim and Y Suh ldquoClassification costan empirical comparison among traditional classifier Cost-Sensitive Classifier and MetaCostrdquo Expert Systems with Appli-cations vol 39 no 4 pp 4013ndash4019 2012

[5] C-Y Yang J-S Yang and J-J Wang ldquoMargin calibration inSVM class-imbalanced learningrdquo Neurocomputing vol 73 no1ndash3 pp 397ndash411 2009

[6] H Masnadi-Shirazi N Vasconcelos and A Iranmehr ldquoCost-sensitive supportvector machinesrdquo httparxivorgabs12120975

[7] Y Artan M A Haider D L Langer et al ldquoProstate cancerlocalization with multispectral MRI using cost-sensitive sup-port vector machines and conditional random fieldsrdquo IEEETransactions on Image Processing vol 19 no 9 pp 2444ndash24552010

[8] C-J Hsieh K-W Chang C-J Lin S S Keerthi and SSundararajan ldquoA dual coordinate descent method for large-scale linear SVMrdquo in Proceedings of the 25th InternationalConference on Machine Learning pp 408ndash415 July 2008

[9] R-E Fan K-W Chang C-J Hsieh X-R Wang and C-JLin ldquoLIBLINEAR a library for large linear classificationrdquo TheJournal of Machine Learning Research vol 9 pp 1871ndash18742008

[10] T Eitrich and B Lang ldquoParallel cost-sensitive support vectormachine software for classificationrdquo in Proceedings of the Work-shop from Computational Biophysics to Systems Biology NICSeries pp 141ndash144 John vonNeumann Institute forComputingJulich Germany 2006

[11] B Tang W Liu and T Song ldquoWind turbine fault diagnosisbased on Morlet wavelet transformation and Wigner-Villedistributionrdquo Renewable Energy vol 35 no 12 pp 2862ndash28662010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Cost-Sensitive Support Vector Machine Using …downloads.hindawi.com/journals/aaa/2014/416591.pdf · 2019-07-31 · Research Article Cost-Sensitive Support Vector

Abstract and Applied Analysis 7

Table 2 Cost-sensitive parameters of three algorithms using CSVMon 6 datasets

Dataset 119862120574 119862(1 minus 120574)

KDD99 (intrusion detection) 4 1Web span 2 1MNIST 10 1Covertype 211 1SIAM1 2000 1SIAM11 716 1

Evaluation of the performance of the three algorithmsusing CSVM was 10-fold cross-validation tested averagemisclassification cost the training time and the recognitionrate of the positive class (ie the recognition rate of theminority class)

Average misclassification cost (AMC) represents 10-foldcross-validation averagemisclassification cost on test datasetsfor related CSVMs described as follows

AMC =119891119901 times 119862120574 + 119891119899 times 119862 (1 minus 120574)

119873 (55)

where 119891119901 and 119891119899 represent the number of the positiveexamples misclassified as the negative examples in the testdataset and the number of the negative examplesmisclassifiedas the positive examples in the test data set respectively119862120574 and 119862(1 minus 120574) denote the cost of the positive examplesmisclassified as the negative examples in the test data setand the cost of the negative examples misclassified as thepositive examples in the test data set respectively119873 denotesthe number of test examples

The recognition rate of the positive class is the numberof classified positive classes and the number of the positiveclasses on testing dataset

Training time in seconds is used to evaluate the conver-gence speeding of three algorithms using CSVM on the samecomputer

Cost-sensitive parameters of three algorithms usingCSVM are specified as the following Table 2 Cost-sensitiveparameters 119862120574 and 119862(1 minus 120574) are valued according to the classratio of datasets namely the class ratio of minority class andmajority class

Three datasets with relative class imbalance are exam-ined Namely KDD99 (intrusion detection) Web span andMNIST datasets are considered Three datasets with severeclass imbalance are examined Namely Covertype SIAM1and SIAM11 datasets are considered The average misclassi-fication cost comparison of three algorithms using CSVMis shown in Figure 1 for each of the datasets The CSVM-RDCD algorithm outperforms the PCSVM and CSVM on alldatasets

The recognition rate of positive class comparison of threealgorithms using CSVM is shown in Figure 2 for each ofthe datasets The CSVM-RDCD algorithm outperforms thePCSVM and CSSVM on all datasets surpasses the PCSVMon four datasets and ties with the PCSVM on two datasets

We examine large datasets with relative imbalance ratiosand severe imbalance ratios to evaluate the convergence

1 2 3 4 5 6003

0035

004

0045

005

0055

006

0065

007

0075

AM

C

AMC

CSVM-RDCDPCSVMCSSVM

Datasets

Figure 1 Average misclassification cost comparison of three algo-rithms using CSVM on 6 datasets

1 2 3 4 5 60972

0974

0976

0978

098

0982

0984

0986

0988

099

0992

Datasets

Reco

gniti

on ra

te o

f pos

itive

clas

s

Recognition rate of positive class

CSVM-RDCDPCSVMCSSVM

Figure 2 The positive class recognition rate comparison of threealgorithms using CSVM

speed of CSVM-RDCD algorithm The training time com-parison of three algorithms using CSVM is shown in Figure 3for each of the datasetsTheCSVM-RDCD algorithm outper-forms the PCSVM and CSSVM on all datasets

52 Experiment on Real-World Big Class-Imbalanced DatasetClassification Problems In order to verify the effectivenessof the proposed algorithm CSVM-RDCD on real-worldbig class-imbalanced data classification problems it was

8 Abstract and Applied Analysis

1 2 3 4 5 60

5

10

15

20

25

30

Trai

ning

tim

e (s)

Training time

Datasets

CSVM-RDCDPCSVMCSSVM

Figure 3 Training time (s) comparison of three algorithms usingCSSVM

Table 3 Specification of the benchmark datasets and cost-sensitiveparameters on the dataset

Dataset Fault diagnosis of wind turbineNumber of ex 20000Number of feat 56Ratio 190Positive class Local crack fault119862120574 190119862(1 minus 120574) 1

evaluated using the real vibration data measured in the windturbine The experimental data were from the SKF Wind-Con software and collected from a wind turbine gearboxtype TF138-A [11] The vibration signals were continuouslyacquired by an accelerometer mounted on the outer case ofthe gearbox

All parameter settings for the dataset are listed in Table 3The statistical results of the big class-imbalanced data prob-lems that measure the quality of results (average misclassi-fication cost recognition rate of positive class and trainingtime) are listed in Table 4 From Table 4 it can be concludedthat CSVM-RDCD is able to consistently achieve superiorperformance in the big class-imbalanced data classificationproblems

Experimental results show that it is applicable to solvecost-sensitive SVM dual problem using randomized dualcoordinate descent method on the large-scale experimentaldata sets The proposed method can achieve superior perfor-mance in the average misclassification cost recognition rateof positive class and training time Large-scale experimentaldata sets show that cost-sensitive support vector machinesusing randomized dual coordinate descent method run

Table 4 Classification performance of three algorithms usingCSVM on the dataset

Name ofalgorithms AMC Recognition of

positive class Training time

CSVM-RDCD 001825 985 543 sPCSVM 00334 976 615 sCSSVM 00379 972 734 s

more efficiently than both PCSVM and CSSVM especiallyrandomized dual coordinate descent algorithmhas advantageof training time on large-scale data sets CSSVM needs tobuild complex whole gradient and kernel matrix 119876 andneeds to select the set of complex work in solving process ofdecomposition algorithm Decomposition algorithm updatesfull uniform gradient information as a whole the computa-tional complexity 119874(119897119899) for the full gradient update PCSVMalso has similar computational complexity Randomized dualcoordinate gradient method updates linearly the coordinatesits computational complexity as 119874(119899) which increases con-siderably the convergence speed of the proposed method

6 Conclusions

Randomized dual coordinate descentmethod (RDCD) is theoptimization algorithm to update the global solution which isobtained by solving an analytical solution of the suboptimalproblemThe RDCDmethod has the rapid convergence ratewhich is mainly due to the following (1) the subproblemhas formal analytical solution which is solved in solvingprocess without complex numerical optimization (2) thenext component of RDCD method in solving process isupdated on the basis of a previous component comparedwith the full gradient information updated CSSVM methodas a whole the objective function of RDCD method candecline faster (3) the single coordinate gradient calculationof RDCDmethod is simpler and easier than the full gradientcalculation

Randomized dual coordinate descent method is appliedto cost-sensitive support vector machine which expandedthe scope of application of the randomized dual coordinatedescent method For large-scale class-imbalanced problema cost-sensitive SVM using randomized dual coordinatedescent method is proposed Experimental results and anal-ysis show the effectiveness and feasibility of the proposedmethod

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the National ScienceFund for Distinguished Young Scholars of China (no61025015) the National Natural Science Foundation of China

Abstract and Applied Analysis 9

(nos 51305046 and 61304019) the Research Foundation ofEducation Bureau of Hunan Province China (no 12A007)Open Fund of Hunan Province University Key Laboratoryof Bridge Engineering (Changsha University of Scienceand Technology) the Key Laboratory of Renewable EnergyElectric-Technology of Hunan Province (Changsha Univer-sity of Science and Technology) the Key Laboratory ofEfficient and Clean Energy Utilization College of HunanProvince and the Introduction of Talent Fund of ChangshaUniversity of Science and Technology

References

[1] M A Davenport ldquoThe 2nu-SVM a cost-sensitive extension ofthe nu-SVMrdquo Tech Rep TREE 0504 Department of Electricaland Computer Engineering Rice University 2005

[2] M Kim ldquoLarge margin cost-sensitive learning of conditionalrandom fieldsrdquo Pattern Recognition vol 43 no 10 pp 3683ndash3692 2010

[3] Y-J Park S-H Chun and B-C Kim ldquoCost-sensitive case-based reasoning using a genetic algorithm application tomedical diagnosisrdquoArtificial Intelligence inMedicine vol 51 no2 pp 133ndash145 2011

[4] J Kim K Choi G Kim and Y Suh ldquoClassification costan empirical comparison among traditional classifier Cost-Sensitive Classifier and MetaCostrdquo Expert Systems with Appli-cations vol 39 no 4 pp 4013ndash4019 2012

[5] C-Y Yang J-S Yang and J-J Wang ldquoMargin calibration inSVM class-imbalanced learningrdquo Neurocomputing vol 73 no1ndash3 pp 397ndash411 2009

[6] H Masnadi-Shirazi N Vasconcelos and A Iranmehr ldquoCost-sensitive supportvector machinesrdquo httparxivorgabs12120975

[7] Y Artan M A Haider D L Langer et al ldquoProstate cancerlocalization with multispectral MRI using cost-sensitive sup-port vector machines and conditional random fieldsrdquo IEEETransactions on Image Processing vol 19 no 9 pp 2444ndash24552010

[8] C-J Hsieh K-W Chang C-J Lin S S Keerthi and SSundararajan ldquoA dual coordinate descent method for large-scale linear SVMrdquo in Proceedings of the 25th InternationalConference on Machine Learning pp 408ndash415 July 2008

[9] R-E Fan K-W Chang C-J Hsieh X-R Wang and C-JLin ldquoLIBLINEAR a library for large linear classificationrdquo TheJournal of Machine Learning Research vol 9 pp 1871ndash18742008

[10] T Eitrich and B Lang ldquoParallel cost-sensitive support vectormachine software for classificationrdquo in Proceedings of the Work-shop from Computational Biophysics to Systems Biology NICSeries pp 141ndash144 John vonNeumann Institute forComputingJulich Germany 2006

[11] B Tang W Liu and T Song ldquoWind turbine fault diagnosisbased on Morlet wavelet transformation and Wigner-Villedistributionrdquo Renewable Energy vol 35 no 12 pp 2862ndash28662010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Cost-Sensitive Support Vector Machine Using …downloads.hindawi.com/journals/aaa/2014/416591.pdf · 2019-07-31 · Research Article Cost-Sensitive Support Vector

8 Abstract and Applied Analysis

1 2 3 4 5 60

5

10

15

20

25

30

Trai

ning

tim

e (s)

Training time

Datasets

CSVM-RDCDPCSVMCSSVM

Figure 3 Training time (s) comparison of three algorithms usingCSSVM

Table 3 Specification of the benchmark datasets and cost-sensitiveparameters on the dataset

Dataset Fault diagnosis of wind turbineNumber of ex 20000Number of feat 56Ratio 190Positive class Local crack fault119862120574 190119862(1 minus 120574) 1

evaluated using the real vibration data measured in the windturbine The experimental data were from the SKF Wind-Con software and collected from a wind turbine gearboxtype TF138-A [11] The vibration signals were continuouslyacquired by an accelerometer mounted on the outer case ofthe gearbox

All parameter settings for the dataset are listed in Table 3The statistical results of the big class-imbalanced data prob-lems that measure the quality of results (average misclassi-fication cost recognition rate of positive class and trainingtime) are listed in Table 4 From Table 4 it can be concludedthat CSVM-RDCD is able to consistently achieve superiorperformance in the big class-imbalanced data classificationproblems

Experimental results show that it is applicable to solvecost-sensitive SVM dual problem using randomized dualcoordinate descent method on the large-scale experimentaldata sets The proposed method can achieve superior perfor-mance in the average misclassification cost recognition rateof positive class and training time Large-scale experimentaldata sets show that cost-sensitive support vector machinesusing randomized dual coordinate descent method run

Table 4 Classification performance of three algorithms usingCSVM on the dataset

Name ofalgorithms AMC Recognition of

positive class Training time

CSVM-RDCD 001825 985 543 sPCSVM 00334 976 615 sCSSVM 00379 972 734 s

more efficiently than both PCSVM and CSSVM especiallyrandomized dual coordinate descent algorithmhas advantageof training time on large-scale data sets CSSVM needs tobuild complex whole gradient and kernel matrix 119876 andneeds to select the set of complex work in solving process ofdecomposition algorithm Decomposition algorithm updatesfull uniform gradient information as a whole the computa-tional complexity 119874(119897119899) for the full gradient update PCSVMalso has similar computational complexity Randomized dualcoordinate gradient method updates linearly the coordinatesits computational complexity as 119874(119899) which increases con-siderably the convergence speed of the proposed method

6 Conclusions

Randomized dual coordinate descentmethod (RDCD) is theoptimization algorithm to update the global solution which isobtained by solving an analytical solution of the suboptimalproblemThe RDCDmethod has the rapid convergence ratewhich is mainly due to the following (1) the subproblemhas formal analytical solution which is solved in solvingprocess without complex numerical optimization (2) thenext component of RDCD method in solving process isupdated on the basis of a previous component comparedwith the full gradient information updated CSSVM methodas a whole the objective function of RDCD method candecline faster (3) the single coordinate gradient calculationof RDCDmethod is simpler and easier than the full gradientcalculation

Randomized dual coordinate descent method is appliedto cost-sensitive support vector machine which expandedthe scope of application of the randomized dual coordinatedescent method For large-scale class-imbalanced problema cost-sensitive SVM using randomized dual coordinatedescent method is proposed Experimental results and anal-ysis show the effectiveness and feasibility of the proposedmethod

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work is partially supported by the National ScienceFund for Distinguished Young Scholars of China (no61025015) the National Natural Science Foundation of China

Abstract and Applied Analysis 9

(nos 51305046 and 61304019) the Research Foundation ofEducation Bureau of Hunan Province China (no 12A007)Open Fund of Hunan Province University Key Laboratoryof Bridge Engineering (Changsha University of Scienceand Technology) the Key Laboratory of Renewable EnergyElectric-Technology of Hunan Province (Changsha Univer-sity of Science and Technology) the Key Laboratory ofEfficient and Clean Energy Utilization College of HunanProvince and the Introduction of Talent Fund of ChangshaUniversity of Science and Technology

References

[1] M A Davenport ldquoThe 2nu-SVM a cost-sensitive extension ofthe nu-SVMrdquo Tech Rep TREE 0504 Department of Electricaland Computer Engineering Rice University 2005

[2] M Kim ldquoLarge margin cost-sensitive learning of conditionalrandom fieldsrdquo Pattern Recognition vol 43 no 10 pp 3683ndash3692 2010

[3] Y-J Park S-H Chun and B-C Kim ldquoCost-sensitive case-based reasoning using a genetic algorithm application tomedical diagnosisrdquoArtificial Intelligence inMedicine vol 51 no2 pp 133ndash145 2011

[4] J Kim K Choi G Kim and Y Suh ldquoClassification costan empirical comparison among traditional classifier Cost-Sensitive Classifier and MetaCostrdquo Expert Systems with Appli-cations vol 39 no 4 pp 4013ndash4019 2012

[5] C-Y Yang J-S Yang and J-J Wang ldquoMargin calibration inSVM class-imbalanced learningrdquo Neurocomputing vol 73 no1ndash3 pp 397ndash411 2009

[6] H Masnadi-Shirazi N Vasconcelos and A Iranmehr ldquoCost-sensitive supportvector machinesrdquo httparxivorgabs12120975

[7] Y Artan M A Haider D L Langer et al ldquoProstate cancerlocalization with multispectral MRI using cost-sensitive sup-port vector machines and conditional random fieldsrdquo IEEETransactions on Image Processing vol 19 no 9 pp 2444ndash24552010

[8] C-J Hsieh K-W Chang C-J Lin S S Keerthi and SSundararajan ldquoA dual coordinate descent method for large-scale linear SVMrdquo in Proceedings of the 25th InternationalConference on Machine Learning pp 408ndash415 July 2008

[9] R-E Fan K-W Chang C-J Hsieh X-R Wang and C-JLin ldquoLIBLINEAR a library for large linear classificationrdquo TheJournal of Machine Learning Research vol 9 pp 1871ndash18742008

[10] T Eitrich and B Lang ldquoParallel cost-sensitive support vectormachine software for classificationrdquo in Proceedings of the Work-shop from Computational Biophysics to Systems Biology NICSeries pp 141ndash144 John vonNeumann Institute forComputingJulich Germany 2006

[11] B Tang W Liu and T Song ldquoWind turbine fault diagnosisbased on Morlet wavelet transformation and Wigner-Villedistributionrdquo Renewable Energy vol 35 no 12 pp 2862ndash28662010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Cost-Sensitive Support Vector Machine Using …downloads.hindawi.com/journals/aaa/2014/416591.pdf · 2019-07-31 · Research Article Cost-Sensitive Support Vector

Abstract and Applied Analysis 9

(nos 51305046 and 61304019) the Research Foundation ofEducation Bureau of Hunan Province China (no 12A007)Open Fund of Hunan Province University Key Laboratoryof Bridge Engineering (Changsha University of Scienceand Technology) the Key Laboratory of Renewable EnergyElectric-Technology of Hunan Province (Changsha Univer-sity of Science and Technology) the Key Laboratory ofEfficient and Clean Energy Utilization College of HunanProvince and the Introduction of Talent Fund of ChangshaUniversity of Science and Technology

References

[1] M A Davenport ldquoThe 2nu-SVM a cost-sensitive extension ofthe nu-SVMrdquo Tech Rep TREE 0504 Department of Electricaland Computer Engineering Rice University 2005

[2] M Kim ldquoLarge margin cost-sensitive learning of conditionalrandom fieldsrdquo Pattern Recognition vol 43 no 10 pp 3683ndash3692 2010

[3] Y-J Park S-H Chun and B-C Kim ldquoCost-sensitive case-based reasoning using a genetic algorithm application tomedical diagnosisrdquoArtificial Intelligence inMedicine vol 51 no2 pp 133ndash145 2011

[4] J Kim K Choi G Kim and Y Suh ldquoClassification costan empirical comparison among traditional classifier Cost-Sensitive Classifier and MetaCostrdquo Expert Systems with Appli-cations vol 39 no 4 pp 4013ndash4019 2012

[5] C-Y Yang J-S Yang and J-J Wang ldquoMargin calibration inSVM class-imbalanced learningrdquo Neurocomputing vol 73 no1ndash3 pp 397ndash411 2009

[6] H Masnadi-Shirazi N Vasconcelos and A Iranmehr ldquoCost-sensitive supportvector machinesrdquo httparxivorgabs12120975

[7] Y Artan M A Haider D L Langer et al ldquoProstate cancerlocalization with multispectral MRI using cost-sensitive sup-port vector machines and conditional random fieldsrdquo IEEETransactions on Image Processing vol 19 no 9 pp 2444ndash24552010

[8] C-J Hsieh K-W Chang C-J Lin S S Keerthi and SSundararajan ldquoA dual coordinate descent method for large-scale linear SVMrdquo in Proceedings of the 25th InternationalConference on Machine Learning pp 408ndash415 July 2008

[9] R-E Fan K-W Chang C-J Hsieh X-R Wang and C-JLin ldquoLIBLINEAR a library for large linear classificationrdquo TheJournal of Machine Learning Research vol 9 pp 1871ndash18742008

[10] T Eitrich and B Lang ldquoParallel cost-sensitive support vectormachine software for classificationrdquo in Proceedings of the Work-shop from Computational Biophysics to Systems Biology NICSeries pp 141ndash144 John vonNeumann Institute forComputingJulich Germany 2006

[11] B Tang W Liu and T Song ldquoWind turbine fault diagnosisbased on Morlet wavelet transformation and Wigner-Villedistributionrdquo Renewable Energy vol 35 no 12 pp 2862ndash28662010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Cost-Sensitive Support Vector Machine Using …downloads.hindawi.com/journals/aaa/2014/416591.pdf · 2019-07-31 · Research Article Cost-Sensitive Support Vector

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of