Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
SVM classifiers
Binary classification
Given training data (๐ฅ๐ , ๐ฆ๐) for ๐ = 1 . . . N, with ๐ฅ๐ โ R๐ and ๐ฆ๐ โ {-1,1},
learn a classifier ๐(๐ฅ) such that
๐ ๐ฅ๐ = โฅ 0, ๐ฆ๐= +1< 0, ๐ฆ๐= โ1
i.e. ๐ฆ๐๐ ๐ฅ๐ > 0 for a correct classification.
Linear separability
Linearly
separable
not
Linearly
separable
Linear classifiers
A linear classifier has the form
๐ ๐ฅ = ๐ค๐๐ฅ + b
โข In 2D the discriminant is a line
โข ๐ค is the normal to the line, and b the bias
โข ๐ค is known as the weight vector
Linear classifiers
A linear classifier has the form
๐ ๐ฅ = ๐ค๐๐ฅ + b
โข In 3D the discriminant is a plane, and in nD it is a hyperplain
For a K-NN classifier it was necessary to โcarryโ the training data
For a linear classifier, the training data is used to learn ๐ค and then discarded
Only ๐ค is needed for classifying new data
Reminder: The Perceptron Classifier
Given linearly separable data ๐ฅ๐ labelled into two categories ๐ฆ๐ = {-1,1}, find a weight vector ๐ค such that the discriminant function
๐ ๐ฅ๐ = ๐ค๐๐ฅ๐+ b
Separates the categories for ๐ = 1, โฆ ,N
โข How can we find this separating hyperplane?
The Perceptron Algorithm
Write classifier as ๐ ๐ฅ๐ = ๐ค๐ ๐ฅ๐ + ฯ0 = ๐ค
๐๐ฅ๐where ๐ค = ( ๐ค, ฯ0), ๐ฅ๐ = ( ๐ฅ๐,1)
โข Initialize ๐ค = 0
โข Cycle though the data points {๐ฅ๐, ๐ฆ๐}
โข If ๐ฅ๐ is misclassified then ๐ค โ ๐ค + ฮฑsign(๐ ๐ฅ๐ )๐ฅ๐โข Until all the data is correctly classified
For example in 2D
โข Initialize ๐ค = 0
โข Cycle though the data points {๐ฅ๐, ๐ฆ๐}
โข If ๐ฅ๐ is misclassified then ๐ค โ ๐ค + ฮฑsign(๐ ๐ฅ๐ )๐ฅ๐
โข Until all the data is correctly classified
โข If the data is linearly separable, then the algorithm will converge
โข Convergence can be slow โฆ
โข Separating line close to training data
โข We would prefer a larger margin for generalization
โข Maximum margin solution: most stable under perturbations of the inputs
Support Vector Machine
SVM โ sketch derivation
โข Since ๐ค๐๐ฅ+ b = 0 and c(๐ค๐๐ฅ+ b) = 0 define the same plane, we have the freedom to choose the normalization of ๐ค
โข Choose normalization such that ๐ค๐๐ฅ++ b = +1 and ๐ค๐๐ฅ-+ b = -1 for the
positive and negative support vectors respectively
โข Then the margin is given by
๐ค
๐คโ (๐ฅ+ โ ๐ฅโ) =
๐ค๐(๐ฅ+โ๐ฅโ)
๐ค=
2
๐ค
Support Vector MachineLinearly separable data
SVM โ Optimization
โข Learning the SVM can be formulated as an optimization:
m๐๐ฅ๐ค
2
๐คsubject to ๐ค๐๐ฅ๐ + ๐
โฅ 1 ๐๐ ๐ฆ๐ = +1โค โ1 ๐๐ ๐ฆ๐ = โ1
for ๐ = 1 . . . N
โข Or equivalently
m๐๐๐ค
๐ค 2 subject to ๐ฆ๐(๐ค๐๐ฅ๐ + ๐) โฅ 1 for ๐ = 1 . . . N
โข This is a quadratic optimization problem subject to linear constraints and
there is a unique minimum
Linear separability again: What is the best w?
โข The points can be linearly separated but there is a very narrow margin
In general there is a trade off between the margin and the number of
Mistakes on the training data
โข But possibly the large margin solution is better, even though one constraint is violated
Introduce โslackโ variables
โSoftโ margin solution
The optimization problem becomes
min๐คโ๐ ๐,๐
๐โ๐ +
๐ค 2 +๐
๐
๐
๐
subject to
๐ฆ๐(๐ค๐๐ฅ๐ + ๐) โฅ 1- ๐๐ for ๐ = 1 . . . N
โข Every constraint can be satisfied if ๐๐ is sufficiently large
โข ๐ถ is regularization parameter:
- small ๐ถ allows constraints to be easily ignored โ large margin
- large ๐ถ makes constraints hard to ignored โ narrow margin
- ๐ถ = โ enforces all constraints: hard margin
โข This is still a quadratic optimization problem and there is a unique minimum.
Note, there is only one parameter, ๐ถ.
โข Data is linearly separable
โข But only with a narrow margin
๐ถ = โ : hard margin
C = 10 soft margin
Application: Pedestrian detection in
Computer Vision
โข Objective: detect (localize) standing humans in an image
(c.f. face detection with a sliding window classifier)โข
โข reduces object detection to binary classification
โข does an image window contain a person or not?
Detection problem (binary) classification
problem
Each window is separately classified
Training data
โข 64x128 images of humans cropped from a varied set of personal photos
โข Positive data โ 1239 positive window examples (reflections->2478)
โข Negative data โ 1218 person-free training photos (12180 patches)
Training
โข A preliminary detector
โข Trained with (2478) vs (12180) samples
โข Retraining
โข With augmented data set
โข initial 12180 + hard examples
โข Hard examples
โข 1218 negative training photos are searched exhaustively for false positive
Feature: histogram of oriented gradients
(HOG)
Averaged examples
Algorithm
โข Training(Learning)
โข Represent each example window by a HOG feature vector
โข Train a SVM classifier
โข Testing(Detection)
โข Sliding window classifier
๐ ๐ฅ = ๐ค๐๐ฅ + b
x๐โR๐, ๐ค๐๐กโ ๐ = 1024
Learned model
๐ ๐ฅ = ๐ค๐๐ฅ + b