Using SVMs for Scientists and Engineers - PRT Blog

7/21/2019 Using SVMs for Scientists and Engineers - PRT Blog

1/13


http://newfolder.github.io/blog/2013/07/24/using-svms/

Using SVMs for Scientists and EngineersJul 24th, 2013

In the mid-90s, support-vector machines became extremely popular machine learning algorithms due to

a number of very nice properties, and because they can also acheive state-of-the-art performance on a

number of data sets. Although the statistical underpinnings of why SVMs work rely on somewhat

abstract statistical theory, modern statistical packages (like libSVM, and the PRT) make training andusing SVMs almost trivial for the average engineer That said, getting good performance out of an SVM

is often not as easy as simply running pre-existing code on your data, and for some data-sets, SVM

classification may not be appropriate.

This blog entry will serve two purposes - 1) to provide an introduction to practical issues you (as an

engineer or scientist) may encounter when using an SVM on your data, and 2) to be the first in a series

of similar for Engineers& Scientists posts dedicated to helping engineers understand the tradeoffs

and assumptions, and practical details of using various machine learning approaches on their data.

ContentsQuick Notes

SVM Formulation

Appropriate Data Sets

SVM Parameters & Notes

Parameter: Cost (Scalar)

Parameter: Relative Class Error Weights

Parameter: Kernel Choice & Associated Parameters

SVM Pre-Prccessing

Optimizing Parameters

Some Rules-Of-ThumbConcluding

Quick NotesThoughtout this post, well be using prtClassLibSvm, which is built directly on top of the fantastic LibSVM

library, available here:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

The parameter nomenclature were using matches theirs pretty closely, so feel free to leverage their

Get Updates: By RSSPRT BlogMATLAB Pattern Recognition Open Free and Easy
http://-/?-http://-/?-http://-/?-https://en.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_theoryhttp://newfolder.github.io/atom.xmlhttp://www.csie.ntu.edu.tw/~cjlin/libsvm/http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-http://-/?-https://en.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_theory


2/13


http://newfolder.github.io/blog/2013/07/24/using-svms/ 2

documentation as well.

SVM FormulationTypical SVM formulations assume that you have a set of n-dimensional real training vectors, {x_i} for i =

1N, and corresponding labels {y_i}, y_i \in {-1,1}. Let x_ik represent the kth element of the vector x_i.

Also assume that you have a relevant kernel function (https://en.wikipedia.org/wiki/Kernel_methods), P,

which takes two input arguments, both n-dimensional real vectors, and outputs a scalar metric -

P(x_i,x_j) = z_ij. The most common choice of P is a radial basis function

(http://en.wikipedia.org/wiki/Radial_basis_function): P(x_i,x_j) = exp(- (\sum_{k} (x_ik-x_jk)^2 )/s^2 )

SVMs perform prediction of new labels by calculating:

f(x) = \hat{y} = ( \sum_{i} (w_i*P(x_i,x) - b) ) > 0

e.g., the SVM learns a representation for the labels (y) based on the data (x) with a linear combination

(w) of a set of functions of the training data (x_i) and the test data (x).

Appropriate Data SetsBinary/M-Ary: Typically, SVMs are appropriate for binary classification problems - multi-class problems

require some extensions of SVMs, although in the PRT, SVMs can be used in

prtClassBinaryToMaryOneVsAll to emulate multi-class classification.

Data: SVM formulations often assume vector-valued training data, however as long as a suitable kernel-

function can be constructed, SVMs can be used on arbitrary data (e.g., string-match distances can be

usned as a kernel for calculating the distances between character strings). Note, however, that SVMs do

assume that the kernel used is a Mercer kernel, so some functions are not appropriate as SVM kernels -

http://en.wikipedia.org/wiki/Mercers_theorem.

Computational Considerations: Depending on the kernel, and particular algorithm under consideration,

training an SVM can be very time-consuming for very large data sets. Proper selection of SVM

parameters can significantly improve training time. At run-time, SVMs are typically very fast, with

computational complexity that grows approximately linearly with the size of the training data set.

SVM Parameters & NotesAs you might imagine, several SVM parameters will have significant effect on overall classification

performance. Good performance requires careful selection of each of these; though some generalrules-of-thumb can help provide reasonable performance with a minimum of headaches.

Parameter: Cost (Scalar)Internally, the SVM is going to try and ignore a whole bunch of your training data, by setting their

corresponding w_i to zero. This might sound counter-intuitive, but its very important, because it makes

for fast run-time, and also (it turns out) that setting a bunch of ws to zero is fundamental to why the SVM

performs so well in general (see any number of articles on V-C Theory for more information).
http://en.wikipedia.org/wiki/Mercer's_theoremhttp://en.wikipedia.org/wiki/Radial_basis_function


3/13



n or una e y, s presen s a ema - ow muc s ou e ry an ma e ws zero vs. ow m uc

should it try and classify your data absolutely perfectly? More zero-ws might improve performance on

the training set, but reduce the performance of the SVM on an unseen testing set!

The Cost parameter in the SVM enables you to control this trade off. Higher cost leads to more non-

zero w vectors, and more correctly classified training points, while lower costs tend to generate w

vectors with lots of zeros, and slightly worse performance on training data (though performance on

testing data may be better).

We usually run a number of experiments for different cost values across a range of, say 0.01 to 100,

though if performance is plateauing it might make sense to extend this range. The following figures show

how the SVM decision boundaries change with varying costs in the PRT.

close all;

ds = prtDataGenUnimodal;

c = prtClassLibSvm;

count = 1;

forw = logspace(-2,2,4);

c.cost = w; c = c.train(ds);

subplot(2,2,count);

plot(c);

legend off;

title(sprintf('Cost: %.2f',c.cost));

count = count + 1;

end


4/13



Parameter: Relative Class Error WeightsIn typical discussions of cost, errors in both classes are treated equally e.g., its equally bad to call a

-1 a 1 and vice-versa. In realistic operations, that may not be the case for example, failing to detect

a landmine, is significantly worse than calling a coke-can a landmine.

Luckily, SVMs enable us to specify class-specific error costs, so if class 1 has error cost of 1, and class

-1 has an error cost of 100, its 100x as bad to mistake a -1 for a 1 as the opposite.

LibSVM implements these class-specific weights using parameters called w-1, w1, etc. In the PRT,

these are implemented as a vector, weights. The following example shows how the effects of changing

the error weight on class 1 affects the overall SVM contours. Clearly, as the cost on class 1 increases,

the SVM spends more effort to correctly classify red elements.

close all;

c = prtClassLibSvm;count = 1;

forw = logspace(-1,1,4);

c.weight = [1 w]; %Class0: 1, Class1: w

c = c.train(ds);

subplot(2,2,count);

c.plot();

legend off;

title(sprintf(Weight: [%.2f,%.2f],c.weight(1),c.weight(2)));

count = count + 1;


5/13



Parameter: Kernel Choice & Associated ParametersThe proper choice of kernel makes a huge difference in the resulting performance of your classifier. We

tend to stick with RBF and linear kernels (kernelType = 0 or 2 in prtClassLibSvm), but several otheroptions (including hand-made kernels) are also possible. The linear kernel doesnt have any

parameters to set, but the RBF has a parameter that can significantly impact performance. In most

formulations, the parameter is referred to as sigma, but in LibSVM, the parameter is gamma, and its

equivalent to 1/sigma. For the RBF, you can set it to any positive value. You can also use the special

character k, and specify a coefficient as a string. k will evaluate to the number of features in the data

set e.g., 5k evaluates to 10 for a 2-dimensional data set.

In general, we find that for normalized data (see below), the default gamma value of k (the number of

dimensions) works well.

The following example code generates 4 example images for SVM decision boundaries for varying

gamma parameters.

close all;

c = prtClassLibSvm;

count = 1;

d = prtDataGenUnimodal;

forkk = logspace(-1,.5,4);

c.gamma = sprintf(%.2fk,kk);

c = c.train(d);


6/13



subplot(2,2,count);

c.plot();

title(sprintf(\gamma = %s,c.gamma));

legend off;

count = count + 1;

end

SVM Pre-PrccessingNote that for many kernel choices (e.g., RBF, and many others, see

http://en.wikipedia.org/wiki/Kernel_methods#Popular_kernels), the kernel output (P(x_i,x_j) depends

strongly and non-linearly on the magnitudes of the data vectors. E.g., exp(-1000) is not equal to

1000*exp(-1). In fact, if you refer to the RBF equation above, youll notice that if two elements of your

vector have a difference approaching 1000, P(x1,x2) will be dominated by a term like exp(-1000), which

by any reasonable metric (and certainly in floating point precision) is exactly 0. This is a bad thing .

In general, non-linear kernel functions should only be applied to data that is guaranteed to be in a

reasonable range (e.g., -10 to 10), or data that has been pre-processed to remove outliers or control

for data magnitude. The PRT pamkes several such techniques available compare and contrast the

performance in the following example:

close all;

ds = prtDataGenBimodal;

ds.X = 100*ds.X; %scale the data
http://en.wikipedia.org/wiki/Kernel_methods#Popular_kernels


7/13



yOutNaive = kfolds(prtClassLibSvm,ds,3);

yOutNorm = kfolds(prtPreProcZmuv + prtClassLibSvm,ds,3);

[pfNaive,pdNaive] = prtScoreRoc(yOutNaive);

[pfNorm,pdNorm] = prtScoreRoc(yOutNorm);

h = plot(pfNaive,pdNaive,pfNorm,pdNorm);

set(h,'linewidth',3);

legend(h,{'Naive','Pre-Proc'});

title('ROC Curves for Naive and Pre-Processed Application of SVM to Bimodal Data');

Clearly, performance on un-normalized data is attrocious, but simple re-scaling acheives good results.

Optimizing ParametersThe general procedure in developing an SVM is to optimize both the C and gamma parameters for your

particular data set. You can do this using two for-loops and the PRT:

close all;

gammaVec = logspace(-2,1,10);

costVec = logspace(-2,1,10);

ds = prtDataGenUnimodal;

auc = nan(length(gammaVec),length(costVec));

kfoldsInds = ds.getKFoldKeys(3);

forgammaInd = 1:length(gammaVec);


8/13



forcostInd = 1:length(costVec);

c = prtClassLibSvm;

c.cost = costVec(costInd);

c.gamma = gammaVec(gammaInd);

yOut = crossValidate(c,ds,kfoldsInds);

auc(gammaInd,costInd) = prtScoreAuc(yOut);

imagesc(auc,[.95 1]);

colorbar drawnow;

end

end

title('AUC vs. Gamma Index (Vertical) and Cost Index (Horizontal)');

Some Rules-Of-ThumbIn general, you may not have time or simply want to optimize over your SVM parameters. In this case,

you can usually get by using ZMUV pre-processing, and the default SVM parameters (RBF kernel, Cost

= 1, gamma = k)

algo = prtPreProcZmuv + prtClassLibSvm;


9/13



Observation Info Supervised Learning: An Introduction for Scientists and Engineers

We hope this entry helps you make sense of how to use an SVM in real-world scenarios, and how to

optimize the SVM parameters for your particular data set. As always, proper cross-validation is

fundamental to good generalizability.

Happy coding.

Posted by Pete Jul 24th, 2013

Comments9 Comments

Sunil Dadhich

Why do we need to optimize the C and g values in SVM?

Peter Torrione Hi Sunil, The parameters in the SVM control the relative tradeoffs between sparsity and

accuracy on the training data set - even though the default parameters may work well, they

are not guaranteed to work ideally on all data sets. As a result, optimizing the parameters is

recommended. Not sure if that answers your question...

Mauro Baldi

Hello and many thanks again for the previous help!

This time I tried to build and compare different classifiers with the fantastic PRT youdeveloped, and the last classifier is a SVM.

I thoroughly read this guide and I tried, at first, to skip the "manual" pre-processing phase.

Instead, I used the ZMUV pre-processing which, as stated in the guide, avoid to optimize the

SVM parameters manually.

Nevertheless, the resulting ROC curves are not as satisfactory as those coming from the

other classifiers.

What I am wondering is wether this is normal (as I skipped a more detailed preprocessing)

or wether there might be something wrong with my code.

My code is:
http://disqus.com/mauro_baldi/http://disqus.com/petertorrione/http://disqus.com/disqus_OejKaeqy7Q/http://disqus.com/mauro_baldi/http://disqus.com/petertorrione/http://disqus.com/disqus_OejKaeqy7Q/http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1515212062http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1371942340http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1367765696http://newfolder.github.io/blog/2013/07/29/supervised-learning/http://newfolder.github.io/blog/2013/07/15/observation-info/


10/13



%% CLASSIFICATORE (PREPROCZMUV + SVM)%%

algoSVM = prtPreProcZmuv + prtClassLibSvm;

algoSVM = algoSVM.train(TrainingSet);

%% TEST %%

yOutTest = algoSVM.run(TestSet);

kennethmorton Mod

I don't see anything immediately wrong with your code. The default options for

LibSVM uses an RBF kernel. If your data is high dimensional you may need to use

something to reduce the dimensionality first. Have any other kernel classifiers

worked?

Mauro Baldi

Hello Kenny and thank you for your reply.

My data set is not very big. It consists of 1393 rows, 3 columns (the features)

and the corresponding target values (either 0 or 1).

So far I used the RBF kernel as default. I am trying to change the kernel type.

In particular, I read in the help that the kernel attribute is kernelType.

But if I type

algoSVM.kernelType = 0;

to set a linear kernel the following error code appers:

No public field kernelType exists for class prtAlgorithm.

So this means that the kernelType attribute is a private one and might be

changed through a set method.

How can I do that?

I also several questions about this procedure and I apologize in advance if themessage is too long.

kennethmorton Mod

Mauro,

When ou use the followin line:
http://disqus.com/kennethmorton/http://disqus.com/mauro_baldi/http://disqus.com/kennethmorton/http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1518727445http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1526514726http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1516248600http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1518727445http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1515212062http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1516248600


11/13



>> algoSVM = prtPreProcZmuv + prtClassLibSvm;

you are constructing a prtAlgorithm. This is why the properties of the

SVM canot be set directly using algoSVM. Referencing the individual

components of the algorithm can be done by accessing the actionCell

property of prtAlgorithm

>> algoSVM.actionCell{2}.kernelType = 0;

In general I don't like to do things this way. I find it is cleaner to

construct the algorithm with the properties you want using string value

pairs. For example

>> algoSVM = prtPreProcZmuv + prtClassLibSvm('kernelType',0);

1. I am confused b our code. Should there be two SVM al orithms.

Mauro Baldi

Hello Kenny and thank you very much for your, as always, fast and

very detailed replies.

My goal is this: I have a data set made up of a training set and a test

set.

What I would like to do is to build many classifiers (including SVMs)

and, at the end, pick up the best promising one.So far I have built RVM, KNN and SVM classifiers, all thanks to your

PRT toolbox and help.

So, I am really very grateful to you and Peter.

Although this post is just devoted to SVM, I have questions both on

SVMs but also on other issues I have encountered while trying to

implement your suggestions.

Therefore, I'd like to ask you wether I can contact you or Peter

privately.

Anyway, here are my questions:

1) You asked me " I am confused by your code. Should there be two

SVM algorithms. One with a linear kernel and one with an RBF

kennethmorton Mod

Mauro,
http://disqus.com/kennethmorton/http://disqus.com/mauro_baldi/http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1529793643http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1531792665http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1526514726http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1529793643


12/13



This is getting a bit detailed for the comments section. Let's talk this

offline. Please feel free to email me at [email protected]

Kenny

Mauro Baldi

Hello Kenny,

this time I am writing here because the questions I am gonna ask

might interest other people.

In a previous post you said that it is not a problem if you calibrate a

SVM with RBF kernel with or without any preprocessing.

Just to check, I tried the following calibrations:

1) Preprocessing with prtPreProcZmuv and automatic training (i.e.,

without the double loop on parameters Cost and gamma)

2) Manual calibration with prtPreProcPca preprocessing:

algoSVManual = prtPreProcPca + prtClassLibSvm;

3) Manual calibration without any preprocessing

algoSVManual = prtClassLibSvm;

4) Manual calibration with prtPreProcZmuv preprocessing:

Recent PostsDude Where's My Help?

verboseStorage and a little prtAlgorithm plotting

Introducing prtClassNNET

Supervised Learning: An Introduction for Scientists and Engineers

Using SVMs for Scientists and Engineers
http://newfolder.github.io/blog/2013/07/24/using-svms/http://newfolder.github.io/blog/2013/07/29/supervised-learning/http://newfolder.github.io/blog/2013/08/20/introducing-prtclassnnet/http://newfolder.github.io/blog/2013/09/04/verbosestorage-and-a-little-prtalgorithm-plotting/http://newfolder.github.io/blog/2013/10/09/dude-wheres-my-help/http://disqus.com/mauro_baldi/https://disqus.com/websites/?utm_source=newfolderconsulting&utm_medium=Disqus-Footerhttp://disqus.com/http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1531792665http://newfolder.github.io/blog/2013/07/24/using-svms/#comment-1580672771


13/13


Copyright 2013 - Kenneth Morton and Peter Torrione - Powered byOctopress - Theme by Brian Armstrong
http://brianarmstrong.org/http://octopress.org/

Documents

Using SVMs for Scientists and Engineers - PRT Blog