Towards Scalable Support Vector Machines Using Squashing

Preview:

DESCRIPTION

Towards Scalable Support Vector Machines Using Squashing. Author:Dmitry Pavlov, Darya Chudova, Padhraic Smyth Info. And Comp. Science University of California Advisor:Dr. Hsu. Reporter:Hung Ching-Wen. Outline. 1. Motivation 2. Objective - PowerPoint PPT Presentation

Citation preview

Towards Scalable Support Vector Machines Using Squashing

• Author:Dmitry Pavlov, Darya Chudova,

• Padhraic Smyth

• Info. And Comp. Science

• University of California

• Advisor:Dr. Hsu.

• Reporter:Hung Ching-Wen

Outline

• 1. Motivation

• 2. Objective

• 3. Introduction

• 4. SVM

• 5. Squashing for SVM

• 6.EXPERIMENTS

• 7. conclusion

Motivation

• SVM provide classification model with strong theoretical foundation and excellent empirical performance.

• But the major drawback of SVM is the necessity to solve a large-scale quadratic programming problem.

Objective

• This paper combines likelihooh-based squashing with a probabilistic formulation of SVMs, enabling fast training on squashed data sets.

Introduction

• The applicability of SVMs to large datasets is limited ,because the high computational cost.

• Speed-up training algorithms:• Chunking,Osuna’s decomposition method S

MO• They can accelerate the training, but cannot

scale well with the size of the training data.

Introduction

• Reducing the computational cost :

• Sampling

• Boosting

• Squashing(DuMouchel et. al.,Madigan et. al.)

• 本文作者提出 Squashing-SMO,以解決 SVM的高計算成本問題

SVM

• Training data:D={ (xi,yi):i=1,…,N}• xi is a vector, yi=+1,-1

• In linear SVM :The linear separating classify y=<w,x>+b

• w is the normal vector

• b is the intercept of the hyperplane

SVM(non-separable)

SVM(a prior on w)

Squashing for SVM

• (1).Select a probabilistic model• P((X,Y) θ)∣• (2).Our objective is to find mle θML

Squashing for SVM

• (3). Training data:D={ (xi,yi):i=1,…,N} can be grouped into Nc groups

• (Xc,Yc)sq:The squashed data point placed at the cluster C

• βc :the wieght

Squashing for SVM

• If take the prior of w is • P(w) ~ exp(- w∥ ∥2)

Squashing for SVM

• (4).The optimization model for the squashed data:

Squashing for SVM

• Important design issues for the squashing algorithm:

• (1).the choice of the number and location of the squashing points

• (2).to sample the values of w from the prior p(w)• (3).b can be made from the optimization model • (4).fixed w,b ,we evaluate the likelihood of trainin

g point, and repeat the selection procedure L times(L is length)

EXPERIMENTS

• experiment datasets:

• Synthetic data

• UCI machine learning

• UCI KKD repositories

EXPERIMENTS

• Evalute:

• Full-SMO,Srs-SMO(simple random simple),squash-SMO,boost-SMO

• Run:over 100 runs

• Performance:

• Misclassification rate ,learning time ,the memory

EXPERIMENTS(Results on Synthetic data)

• (Wf,bf):estimated by full-SMO• (Ws,bs): :estimated by squashed or sampled data

EXPERIMENTS(Results on Synthetic data)

EXPERIMENTS(Results on Synthetic data)

EXPERIMENTS(Results on Benchmark data)

EXPERIMENTS(Results on Benchmark data)

EXPERIMENTS(Results on Benchmark data

EXPERIMENTS(Results on Benchmark data)

conclusion

• 1.we describe how the use of squashing make the training of SVM applicable to large datasets.

• 2.comparison with full-SMO show squash-SMO and boost-SMO are near-optimal performance with much lower time and memory.

• 3.srs-SMO has a higher misclassification rate.• 4.squash-SMO and boost-SMO can tune paramete

r in cross-validation ,it is impossible to full-SMO

conclusion

• 5.although the performance of squash-SMO and boost-SMO is similar on the benchmark problems.

• 6. squash-SMO can offer a better interpretability of model and can be expected to run faster than SMO that do not reside in the memory.

opinion

• It is a good ideal that the author describe how the use of squashing make the training of SVM applicable to large datasets.

• 我們可以根據資料性質來改變 w的 prior distribution, 例如指數分配 ,Log-normal,或用無母數方法去做

Recommended