18
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology a nd Institute of Statistical Science Academia Sinica 2003 International Conference on Informatics, Cybernetics, and Systems ISU, Kaohsiung, Dec. 14 2003

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Embed Size (px)

DESCRIPTION

Support Vector Machines (SVMs) Powerful tools for Data Mining  SVMs have a sound theoretical foundation  Base on statistical learning theory  SVMs can be generated very efficiently and have high accuracy  SVMs have an optimal defined separating surface algorithm for classification and regression  SVMs become the most promising learning  SVMs can be extend from linear to nonlinear case  By using kernel functions

Citation preview

Page 1: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Incremental Reduced Support Vector Machines

Yuh-Jye Lee, Hung-Yi Lo and Su-Yun HuangNational Taiwan University of Science and Technology and Institute of Statistical Science Academia Sinica

2003 International Conference on Informatics, Cybernetics, and SystemsISU, Kaohsiung, Dec. 14 2003

Page 2: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Outline

Difficulties with nonlinear SVMs for large problems Storage and computational complexity

Reduced Support Vector Machines

Support Vector Machines for classification problems Linear and nonlinear SVMs

Incremental Reduced Support Vector Machines

Numerical Results Conclusions

Page 3: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Support Vector Machines (SVMs)

Powerful tools for Data Mining

SVMs have a sound theoretical foundation Base on statistical learning theory

SVMs can be generated very efficiently and havehigh accuracy

SVMs have an optimal defined separating surface

algorithm for classification and regression SVMs become the most promising learning

SVMs can be extend from linear to nonlinear case By using kernel functions

Page 4: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Support Vector Machines for ClassificationMaximizing the Margin between Bounding Planes

A+

A-

Page 5: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Support Vector Machine Formulation

Solve the quadratic program for some :min

s. t.(QP),

, denoteswhere or membership.

SSVM: Smooth Support Vector Machine is anefficient SVM algorithm proposed by Yuh-Jye Lee

Page 6: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Nonlinear Support Vector Machine

Extend to nonlinear cases by using kernel functions

min

s. t.

Nonlinear Support Vector Machine formulation:

The value of kernel function represents the innerproduct in the feature space

Map data from input space to a higher dimensionalfeature space where the data can be separated linearly

Page 7: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Difficulties with Nonlinear SVM for Large Problems

Separating surface depends on almost entire dataset Need to store the entire dataset after solving the problem

The nonlinear kernel is fully dense Long CPU time to compute numbers Runs out of memory while storing kernel matrix

Computational complexity depends on Complexity of nonlinear SSVM ø

Page 8: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Reduced Support Vector MachinesOvercoming Computational & Storage Difficulties

by Using a Rectangular Kernel

Choose a small random sample of The small random sample is a representative sample

of the entire dataset Typically is 1% to 10% of the rows of

Replace by with corresponding in nonlinear

SSVMthe rectangular kernel

Only need to compute and store numbers for

Computational complexity reduces to The nonlinear separator only depends on

Page 9: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Reduced Setplays the most important role in RSVM

It is natural to raise two questions: Is there a way to choose the reduced set other than

random selection so that RSVM will have a better performance?

Is there a mechanism that determines the size of reduced set automatically or dynamically?

Incremental reduced support vector machine isproposed to answer these questions

Page 10: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Our Observations (Ⅰ)

is a linear combination of a set of kernel functions

If the kernel functions are very similar, thehypothesis space spanned by this kernel functions will be very limited.

The nonlinear separating surface

Page 11: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Our Observations (Ⅱ)

Start with a very small reduced set , then add new data point only when the kernel function is dissimilar to the current function set

These points contribute the most extra information

Page 12: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

The distance from the kernel vector to the column space of is greater than a threshold

The information criterion is

This distance can be determined by solving aleast squares problem

How to measure the dissimilar? solving least squares

problems

Page 13: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Dissimilar Measurementsolving least squares problems

It has a unique solution , and the distance is

í

Page 14: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

IRSVM Algorithm pseudo-code

(sequential version)1 Randomly choose two data from the training data as the initial reduced set2 Compute the reduced kernel matrix3 For each data point not in the reduced set4 Computes its kernel vector5 Computes the distance from the kernel vector 6 to the column space of the current reduced kernel matrix7 If its distance exceed a certain threshold8 Add this point into the reduced set and form the new reduced kernal matrix9 Until several successive failures happened in line 710 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel11 A new data point is classified by the separating surface

Page 15: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Speed up IRSVM

Note we have to solve the least squares problemmany times whose time complixity is

The main cost depends on but not on

Take advantage of this fact, we proposed a batch version of IRSVM that examines a batch points once

Page 16: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

IRSVM Algorithm pseudo-code

(Batch version)1 Randomly choose two data from the training data as the initial reduced set2 Compute the reduced kernel matrix3 For a batch data point not in the reduced set4 Computes their kernel vectors5 Computes the corresponding distances from these kernel vector 6 to the column space of the current reduced kernel matrix7 For those points’ distance exceed a certain threshold8 Add those point into the reduced set and form the new reduced kernal matrix9 Until no data points in a batch were added in line 7,810 Solve the QP problem of nonlinear SVMs with the obtained reduced kernel11 A new data point is classified by the separating surface

Page 17: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

IRSVM on four public data sets

Page 18: Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan…

Conclusions IRSVM — an advanced algorithm of RSVM

Start with extremely small reduced set and sequentially expandsto include informative data points into the reduced set

Determine the size of the reduced set automatically anddynamically but no pre-specified

The reduced set generated by IRSVM will be more representative

All advantages of RSVM for dealing with large scalenonlinear classification problem are retained

Experimental tests show that IRSVM used a smaller reduced set without scarifying classification accuracy