View
66
Download
0
Category
Tags:
Preview:
DESCRIPTION
Combining Ensemble Technique of Support Vector Machines with the Optimal Kernel Method for High Dimensional Data Classification. I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 , Chih-Cheng Hung 3 - PowerPoint PPT Presentation
Citation preview
COMBINING ENSEMBLE TECHNIQUE OF SUPPORT VECTOR MACHINES WITH
THE OPTIMAL KERNEL METHOD FOR HIGH DIMENSIONAL DATA
CLASSIFICATION
I-Ling Chen1, Bor-Chen Kuo1, Chen-Hsuan Li2, Chih-Cheng Hung3 1 Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung,
Taiwan, R.O.C.
2 Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan, R.O.C.
3 School of Computing and Software Engineering, Southern Polytechnic State University, GA, U.S.A.
Introduction• Statement of problems
• The Objective Literature Review• Support Vector Machines
– Kernel method
• Multiple Classifier System
– Random subspace method , Dynamic subspace method
• An Optimal Kernel Method for selecting RBF Kernel Parameter
Optimal Kernel-based Dynamic Subspace Method Experimental Design and Results Conclusion and Future Work
Outline
INTRODUCTION
or so called curse of dimensionality, peaking phenomenon
Small sample size, N
High dimensionality, dlow performance dN
Hughes Phenomenon (Hughes, 1968)
Proposed by Vapnik and Coworkers (1992, 1995, 1996, 1997, 1998)
It’s robust and effect to Hughes phenomenon. (Bruzzone & Persello, 2009; Camps-Valls, Gomez-Chova, Munoz-Mari, Vila-Frances, Calpe-Maravilla,2006; Melgani & Bruzzone,2004; Camps-Valls & Bruzzone, 2005; Fauvel, Chanussot, & Benediktsson, 2006)
SVM includes
Kernel Trick
Support Vector Learning
Support Vector Machines (SVM)
The Goal of Kernel Method for Classification
The samples in the same class can be mapped into the same area.
The samples in the different classes can be mapped into the different areas.
SV learning tries to learn a linear separating hyperplane for a two-class classification problem via a given training set.
Illustration of SV learning with kernel trick:
optimal hyperplanesupport vectors
margins
w
1iy
1iy
support vector
Support Vector Learning
0)( bxwT1)( bxwT
-1)( bxwT
space featurespace original : nonlinear feature mapping
space Feature
Multiple Classifier System
There are two effective approaches for generating an ensemble of diverse base classifiers via different feature subsets.
(Ho, T. K. ,1998 ; Yang, J-M., Kuo, B-C., Yu,P-T. & Chuang, C-H. 2010)
Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms. Hoboken, NJ: Wiley & Sons.
Approaches to building classifier ensembles.
THE FRAMEWORK OF RANDOM SUBSPACE METHOD(RSM) BASED ON SVM (HO, 1998)
Given the learning algorithm, SVM, and the ensemble size, S.
THE INADEQUACIES OF RSM
Given the learning algorithm, SVM, and the ensemble size, S.
* Irregular Rule Each individual feature potentiallypossesses the different discriminatepower for classification.A randomized strategy for selecting feature is unable to distinguish between informative features and redundant ones.
* Implicit NumberHow to choose a suitable subspace dimensionality for the SVM.Without an appropriate subspace dimensionality for the SVM, RSM might be inferior to a single classifier.
random features selection
Given w
• Two importance distributions– Importance distribution of feature weight, W distribution
to model the selected probability of each feature.
– Importance distribution of subspace dimensionality, R distributionto automatically determine the suitable subspace size.
1 49 97 145 1910
1
2
3
4
Dimensionality of Subspace
De
nsi
ty (
%)
1 49 97 145 1910
1
2
3
4
Dimensionality of Subspace
Den
sity
(%
)
Initialization R0
Kernel smoothing
1 49 97 145 1910
1
2
3
4
Feature
De
nsi
ty (
%)
Class separability of LDA for each feature
1 49 97 145 1910
1
2
Feature
Den
sity
(%
)
1 49 97 145 1910
1
2
Feature
Den
sity
(%
)
1 49 97 145 1910
1
2
Feature
Den
sity
(%
)
1 49 97 145 1910
1
2
Feature
Den
sity
(%
)
ML
SVM
kNN
BCC
Re-substitution accuracy for each feature
DYNAMIC SUBSPACE METHOD (DSM) (Yang et al., 2010)
THE FRAMEWORK OF DSM BASED ON SVM
Given the learning algorithm, SVM, and the ensemble size, S.
INADEQUACIES OF DSM
Given the learning algorithm, SVM, and the ensemble size, S.
* Kernel functionThe SVM algorithm provides an effective way to perform supervised classification. However, The kernel function is a critical topic to influence the performance of SVM.
* time-consumingChoosing a proper kernel function or a better parameter of kernel for SVM is quite important yet ordinarily time-consuming. Especially, an updating R distribution is obtained by the resubstitution accuracy in DSM.
The performances of SVM are based on choosing the proper kernel functions or proper parameters of a kernel function.
Li, Lin, Kuo, and Chu (2010) present a novel criterion to choose a proper parameter σ of RBF kernel function automatically.
An Optimal Kernel Method for Selecting RBF Kernel Parameter
Gaussian Radial Basis Function (RBF) kernel :
1),(0 }0{,2
exp),(2
2
zxkR
zxzxk
In the feature space determined by the RBF kernel, the norm of every sample is one, and the kernel values are positive. Hence, the samples will be mapped onto the surface of a hypersphere.
Kernel-based Dynamic Subspace Method (KDSM)
THE FRAMEWORK OF KDSM
Original DatasetX
Sep
arab
ility
Feature (Band)
Kernel based Feature Selection Distribution Mdist
Multiple Classifiers
Subspace Pool(Reduced Dataset)
Decision Fusion(Majority Voting)
Kernel based W distribution
Kernel Space (L-dimension)
Optimal RBF Kernel Algorithm + Kernel Smoothing
Optimal RBF Kernel Algorithmfunction) kernel in dimension each of parameters (Optimal
)(~ WWDSc ),,(~
wMXMFSX
Until the performance of classification is stable
Experiment Design
Algorithm Description
SVM_CVWithout any dimension reduction on only a
single SVM with CV method
SVM_OPWithout any dimension reduction on only a
single SVM with OP method
DSM_WACC
DSM with the re-substitution accuracy as the feature weights
DSM_ WLDA
DSM with the separability of Fisher’s LDA as the feature weights
KDSMKernel-based dynamic subspace method
proposed in this research
OP : the optimal method to choose
CV : 5-fold cross-validation
We use the grid search within a range [0.01, 10] (suggested by Bruzzone & Persello, 2009) to choose a proper parameter (2σ2) of RBF kernel and a set {0.1, 1, 10, 20, 60, 100, 160, 200, 1000} to choose a proper parameter of slack variable to control the margins.
Hyperspectral Image data
EXPERIMENTAL DATASET
IR Image
Image(No. of bands)
Washington, DC Mall(dims d=191)
# of classes 7
Category (No. of labeled data)
Roof (3776)Road (1982)Path (737)
Grass (2870)Tree (1430)
Water (1156)Shadow (840)
Experimental Results
Method SVM_CV SVM_OP DSM_WACC DSM_WLDA KDSM
Case 1 Accuracy (%) 83.66 83.79 85.49 87.47 88.64
CPU Time (sec)
30.35 3.10 6045.31 2188.62 155.31
Case 2 Accuracy (%) 86.39 87.89 88.74 89.43 92.53
CPU Time (sec)
116.02 6.65 21113.75 4883.92 308.26
Case 3 Accuracy (%) 94.69 95.31 95.94 96.94 97.43
CPU Time (sec)
5858.18
376.99 1165048.6
220121.62
17847.7
There are three cases in Washington, DC Mall. case 1: ; case 2:
case 3:
22014020 dNN i28022040 NdN i
2100300220 NNd i
: the number of training samples in class i
: the number of all training samples
iN
N
Experiment Results in Washington, DC Mall
Method
Case 1 Case 2 Case 3
Accuracy
RatioAccurac
y Ratio Accuracy Ratio
DSM_WACC 85.49% 38.924 88.74% 68.493 95.94% 65.277
DSM_WLDA 87.47% 14.092 89.43% 15.844 96.94% 12.333
KDSM 88.64% 1 92.53% 1 97.43% 1
The outcome of classification by using various multiple classifier systems:
Classification Maps with Ni =20 in Washington, DC Mall
□ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof■ Road ■ Shadow
SVM_CV SVM_OP
DSM_WACC DSM_WLDA KDSM
Classification Maps (roof) with Ni =40
□ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof■ Road ■ Shadow
SVM_CV SVM_OP
DSM_WACC DSM_WLDA KDSM
Classification Maps with Ni =300 in Washington, DC Mall
□ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof■ Road ■ Shadow
SVM_CV SVM_OP
DSM_WACC DSM_WLDA KDSM
In this paper, the core of the presented method, KDSM, is applying both optimal algorithm of selecting the proper RBF parameter and dynamic subspace method in the subspace selection based MCS to improve the result of classification in high dimensional dataset.
The experimental results showed that the classification accuracies of KDSM invariably are the best among outcomes of all classifiers in each cases of Washington DC Mall datasets.
Moreover, these results show that comparing with DSM, the KDSM can not only obtain more accurate outcome of classification but also economize on computer time.
Conclusions
25
Thank You
Recommended