Support Vector Machines - Cornell University · Outline •Transform a linear learner into a...

Support Vector Machines

Outline

• Transform a linear learner into a non-linear learner

• Kernels can make high-dimensional spaces tractable

• Kernels can make non-vectorial data tractable

Non-Linear Problems

Problem: • some tasks have non-linear structure • no hyperplane is sufficiently accurate How can SVMs learn non-linear classification rules?

Ofer Melnik, http://www.demo.cs.brandeis.edu/pr/DIBA

Extending the Hypothesis Space Idea: add more features

Learn linear rule in feature space. Example:

The separating hyperplane in feature space is

degree two polynomial in input space.

Transformation

• Instead of x1, x2 use

How do we find these features?

• F(x) =

Reminder

From http://www.support-vector.net/icml-tutorial.pdf

Reminder

Observation

Dual Representation

New Update Rule

• The update rule can be written as

• And the hypothesis h(x) is

– Note: Hypothesis uses only dot products with key examples (“support vectors”)

Max Margin = Minimal Norm

• If we fix the functional margin to 1, the geometric margin equal 1/||w||

• Hence, maximize the margin by minimizing the norm

– Minimize

– Subject to

How to find the ’s?

Using Kernels: Implicit features

• We need to compute <xi,xj> many times

– Or <F(xi),F(xj)> if we use features F(x)

• But what if we wound a set of features F(x) such that F(xi) F(xj) = (xixj)

– Then we only need to compute (xixj)2

– We don’t even need to know what F is (!)

Implicit features: Example

Calculate using a Kernel

• Two vectors

– A = (1,2)

– B = (3,4)

• Three Features:

– F(X) = {x12, x2

2, √2·x1·x2}

– Calculate F(A)·F(B)

What is F(A)·F(B) ? A=120 B=121 C=144 D=256

Calculate without using a Kernel

• A = (1,2), B = (3,4)

• F(X) = {x12, x2

2, √2·x1·x2}

– A=(1,2) F(A)={12, 22, √2·1·2} = {1, 4, 2√2}

– B=(3,4) F(B)={32, 42, √2·3·4} = {9, 16, 12√2}

• F(A)·F(B) = 1·9+4·16+2·12·2 = 121

Calculate using a Kernel

• A = (1,2), B = (3,4), F(X) = {x12, x2

2, √2·x1·x2}

• F(A)·F(B) = (A·B)2 = (1·3+2·4)2= 112 = 121

• We didn’t need to explicitly calculate or even know about the terms in F at all! – just that F(A)·F(B) = (A·B)2

SVM with Kernel Training:

Classification:

New hypotheses spaces through new Kernels:

• Linear:

• Polynomial:

• Radial Basis Function:

• Sigmoid:

Solution with Gaussian Kernels

Examples of Kernels

Polynomial Function Radial Basis

Kernels for Non-Vectorial Data

Kernels for Non-Vectorial Data • Applications with Non-Vectorial Input Data classify non-vectorial objects – Protein classification (x is string of amino acids)

– Drug activity prediction (x is molecule structure)

– Information extraction (x is sentence of words)

– Etc.

• Applications with Non-Vectorial Output Data predict non-vectorial objects – Natural Language Parsing (y is parse tree)

– Noun-Phrase Co-reference Resolution (y is clustering)

– Search engines (y is ranking)

Kernels can compute inner products efficiently!

Properties of SVMs with Kernels

• Expressiveness – Can represent any boolean function (for appropriate

choice of kernel)

– Can represent any sufficiently “smooth” function to arbitrary accuracy (for appropriate choice of kernel)

• Computational – Objective function has no local optima (only one global)

– Independent of dimensionality of feature space

• Design decisions – Kernel type and parameters

– Value of C

Benchmark

• Character recognition

• NIST Database of handwritten digits

– 60,000 samples

– 20x20 pixels

Digit Recognition

• 3-nearest neighbor: 2.4% error rate

– stores all samples

• Single hidden layer NN: 1.6%

– 400 inputs, 10 output, 300 hidden (using CV)

• Specialized nets (LeNet): 0.9%

– Use specialized 2D architecture

• Boosted NN: 0.7%

– Three copies of LeNets

Digit Recognition

• SVM: 1.1%

– Compare to specialized LeNet 0.9%

• Specialized SVM: 0.56%

• Shape Matching: 0.63%

– Machine vision techniques

• Humans?

– A=0.1% B=0.5% C=1% D=2.5% E=5%

Generative models

Hinton et al

Support Vector Machines - Cornell University · Outline •Transform a linear learner into a...

Documents

Graph Kernels · 2010-07-01 · GRAPH KERNELS trast, in this paper we focus on kernels between graphs. The ﬁrst such kernels were proposed by Gartner et al.¨ (2003) and later extended

Introduction to Machine Learning · Support Vector Machine, Kernels, Linear Discriminant Analysis 3 x Unsupervised Learning: Dimensionality Reduction (PCA), Density Estimation, Non-parametric

Signed Kernels

Project-Team ALPINES3.1.Overview 2 3.2.Domain speciﬁc language - parallel FreeFem++2 3.3.Solvers for numerical linear algebra3 3.4.Computational kernels for numerical linear algebra3

Graph Kernels - University of Chicagopeople.cs.uchicago.edu/.../VishwanathanGraphKernelsJMLR.pdfGRAPH KERNELS trast, in this paper we focus on kernels between graphs. The ﬁrst such

› SPP › POSTER › SFB876-A6_Poster_A.pdf · Project A6 - TU DortmundAssignment Base-kernelTreefg I Base kernels that induce PSD assignment kernels I Linear running time I Weisfeiler-Lehman

On Polynomial Kernels for Integer Linear Programs ... · ity of integer linear programs (ILPs), and for ﬁnding good solutions for covering and packing ILPs. Our main results are

Linear Algebra · Linear Algebra using Python Course Code: USCS405 Objectives: To offer the learner the relevant linear algebra concepts through computer science applications. Expected

Arbitrage, Martingales, and Pricing Kernels · Arbitrage, Martingales, and Pricing Kernels 14/ 36. 10.1: Martingales 10.2: Kernels 10.3: Alternative 10.4: ... and transforming the

AutoEncoders& Kernels

Differentiable Compositional Kernel Learning for Gaussian ... · Differentiable Compositional Kernel Learning for Gaussian Processes! "! # Module 1 Module 2 Primitive Kernels Linear

Titu Andreescu Essential Linear Algebra with Applicationsmathschoolinternational.com/Math-Books/Books... · abstract linear algebra such as kernels and images of linear maps, dimensions

COMPILING OPENCL KERNELS

KERNELS FOR LINEAR TIME INVARIANT SYSTEM IDENTIFICATION · KERNELS FOR LINEAR TIME INVARIANT SYSTEM IDENTIFICATION FRANCESCO DINUZZO Abstract. In this paper, we study the problem

Analytic Geometry, Linear Algebra, Kernels, RKHS, and ...Meshfree-methods-seminar/presentations/talk... · Analytic Geometry, Linear Algebra, Kernels, RKHS, ... many students have

kernels - arXiv · among di erent kernels such security kernels, partition kernels and hypervisors. 2.1. What’s the Separation Kernel Separation kernel is a type of security kernels

Perceptron, Kernels, and SVM - …courses.cs.washington.edu/courses/cse546/13au/slides/nov5... · Perceptron Basics Online algorithm Linear classifier Learns set of weights Always

Non-linear FEM Based Compression Simulation of …academicfoodjournal.com/archive/2016/issue1/ResArticle...Non-linear FEM Based Compression Simulation of Pecan Fruit Kernels H. Kursat

Graph Kernels - Purdue Universityvishy/papers/VisSchKonBor10.pdf · walk graph kernels. In Section7we discuss the relation between R-convolution kernels (Haussler,1999) and various

Learning to Learn Kernels with Variational Random Features · vided by the meta-learner to improve its own performance. While potentially strong base-learners, kernels (Hofmann et