47
Workshop on Webscale Vision and Social Media ECCV 2012, Firenze, Italy October, 2012 Linearized Smooth Additive Classifiers Subhransu Maji Research Assistant Professor Toyota Technological Institute at Chicago

Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Workshop  on  Web-­‐scale  Vision  and  Social  MediaECCV  2012,  Firenze,  Italy

October,  2012

Linearized Smooth Additive ClassifiersSubhransu Maji

Research Assistant ProfessorToyota Technological Institute at Chicago

Page 2: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Generalized Additive Models

• Why use them?

• Efficiency : can be efficiently evaluated

• Interpretability : Simple generalization of linear classifiers, i.e., may lead to models that are interpretable

• Well known in the statistics community

• Generalized Additive Models (Hastie & Tibshirani ’90)

• However traditional learning algorithms do not scale well (e.g. “backfitting algorithm”)

f(x1, x2, . . . , xn) = f1(x1) + f2(x2) + . . .+ fn(xn)

Page 3: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Additive kernels in computer vision

• Images are represented as histograms of low level features such as color and texture [Swain and Ballard 01, Odone et al. 05]

• Histogram based similarity measures are typically additive

• Other examples of additive kernels based on approximate correspondence :

Pyramid Match Kernel, Grauman and Darrell, CVPR’05

Spatial Pyramid Match Kernel,Lazebnik, Schmidt and Ponce, CVPR’06

K�2(x,y) =X 2xiyi

xi + yiKmin(x,y) =

Xmin(xi, yi)

Page 4: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Directly learning additive classifiers

f(x1, x2, . . . , xn) = f1(x1) + f2(x2) + . . .+ fn(xn)

A SVM like optimization framework

minf2F

X

k

l�yk, f(xk)

�+ �R(f)

x

k 2 Rd

yk 2 {+1,�1}

l�yk, f(xk

)

�= max(0, 1� ykf(xk

))

Loss function on the datae.g. hinge loss function

Regularizatione.g. derivative norm

Page 5: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Strategy for learning additive classifiers

minf2F

X

k

l�yk, f(xk)

�+ �R(f)

x

k 2 Rd

yk 2 {+1,�1}

Page 6: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Strategy for learning additive classifiers

minf2F

X

k

l�yk, f(xk)

�+ �R(f)

x

k 2 Rd

yk 2 {+1,�1}

Basis expansion

fi =X

j

wij�

ij

Page 7: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Strategy for learning additive classifiers

Smoothness penalty on the weights

minf2F

X

k

l�yk, f(xk)

�+ �R(f)

x

k 2 Rd

yk 2 {+1,�1}

Basis expansion

fi =X

j

wij�

ij

Page 8: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Strategy for learning additive classifiers

• Search for representations of the function and regularization for which the optimization can be efficiently solved

• In particular we want to leverage the latest methods for learning linear classifiers (almost linear time algorithms)

Smoothness penalty on the weights

minf2F

X

k

l�yk, f(xk)

�+ �R(f)

x

k 2 Rd

yk 2 {+1,�1}

Basis expansion

fi =X

j

wij�

ij

Page 9: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : motivation

• Represent each function using a uniformly spaced spline basis

• Motivated by our earlier analysis that splines approximate additive classifiers well

• Popularized by Eilers and Marx (P-Splines ’02, ’04) for additive modeling

• Well known in graphics (DeBoor ’01)

• Question: What is the regularization on w?

Figure from Eilers & Marx ’04

f(x1, x2, . . . , xn) = f1(x1) + f2(x2) + . . .+ fn(xn)X

i

wi�i

Page 10: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : representationf(x) = wT�(x)

Linear'B)Spline' Quadra0c'B)Spline' Cubic'B)Spline'

Representation

Project data onto a uniformly spaced spline basis

Page 11: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : regularizationf(x) = wT�(x)

Linear'B)Spline' Quadra0c'B)Spline' Cubic'B)Spline'

Representation

D0 = I

R(f) = wTHw H = DTDRegularization

Penalize differences between adjacent weightsEnsures that the learned function is smooth

Similar to Penalized Splines of Eilers and Marx ’02

Page 12: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : optimizationf(x) = wT�(x)

Linear'B)Spline' Quadra0c'B)Spline' Cubic'B)Spline'

Representation

Optimization

D0 = I

R(f) = wTHw H = DTDRegularization

c(w) =

2

wTDTd Ddw +

1

n

X

k

max

�0, 1� y

k�wT�(x

k)

��

Page 13: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : linear casef(x) = wT�(x)

Linear'B)Spline' Quadra0c'B)Spline' Cubic'B)Spline'

Representation

D0 = I

Regularization zero order differences [Maji and Berg, ICCV ’09]

Reduces to a standard linear SVM

Projected features are sparse. At most k non-zero entries for a basis of degree k.

Optimization

c(w) =

2

wTw +

1

n

X

k

max

�0, 1� y

k�wT�(x

k)

��

Page 14: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : general casef(x) = wT�(x)

Linear'B)Spline' Quadra0c'B)Spline' Cubic'B)Spline'

Representation

Optimization

Regularization Penalize first order differences. Non-standard SVM due to regularization [Eilers & Marx ’02]

Can offer better smoothness when some bins have few data points

c(w) =

2

wTDT1 D1w +

1

n

X

k

max

�0, 1� y

k�wT�(x

k)

��

Page 15: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : linearization

Original Problem

. . .

c(w) =

2

wTDTd Ddw +

1

n

X

k

max

�0, 1� y

k�wT�(x

k)

��

Page 16: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : linearization

Re-Parameterization of the weight

c(w) =

2

wTw +

1

n

X

k

max

�0, 1� y

k�wTD�T

1 �(x

k)

��

D�T1 =

0

BBBBBB@

1 1 . . . 1 10 1 . . . 1 1

0 . . . 1 1. . .

1 10 1

1

CCCCCCA

Upper TriangularOriginal Problem

. . .

c(w) =

2

wTDTd Ddw +

1

n

X

k

max

�0, 1� y

k�wT�(x

k)

��

Page 17: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : linearization

Equivalently a change of basis

. . .

c(w) =

2

wTDTd Ddw +

1

n

X

k

max

�0, 1� y

k�wT�(x

k)

��

Re-Parameterization of the features

c(w) =

2

wTw +

1

n

X

k

max

�0, 1� y

k�wTD�T

1 �(x

k)

��

D�T1 =

0

BBBBBB@

1 1 . . . 1 10 1 . . . 1 1

0 . . . 1 1. . .

1 10 1

1

CCCCCCA

Upper Triangular

. . .

= D�T1 �

Page 18: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : linearization

Equivalently a change of basis

. . .

c(w) =

2

wTDTd Ddw +

1

n

X

k

max

�0, 1� y

k�wT�(x

k)

��

c(w) =

2

wTw +

1

n

X

k

max

�0, 1� y

k�wTD�T

1 �(x

k)

��

. . .

= D�T1 �

K11 = T

Implicit kernel

Re-Parameterization of the features

Page 19: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : visualizing the kernel

Various basis and regularization

Fixing order(regularization)=1, and varying degree(basis)

Smooth variants of the min kernel

Page 20: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : visualizing the basis

Various basis and regularization

�i(x) = (x� ⌧i)r+

The basis are equivalent to truncated polynomial basis if:degree(basis) - order(regularization) = 1 [DeBoor ’01]

Page 21: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Solving the optimization efficiently

c(w) =

2

wTDTd Ddw +

1

n

X

k

max

�0, 1� y

k�wT�(x

k)

��

Original problem : non-standard regularization

Page 22: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Solving the optimization efficiently

c(w) =

2

wTDTd Ddw +

1

n

X

k

max

�0, 1� y

k�wT�(x

k)

��

Original problem : non-standard regularization

c(w) =

2

wTw +

1

n

X

k

max

�0, 1� y

k�wTD�T

d �(x

k)

��

Modified problem : dense features (memory bottleneck)

Page 23: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Solving the optimization efficiently

c(w) =

2

wTDTd Ddw +

1

n

X

k

max

�0, 1� y

k�wT�(x

k)

��

Original problem : non-standard regularization

c(w) =

2

wTw +

1

n

X

k

max

�0, 1� y

k�wTD�T

d �(x

k)

��

Modified problem : dense features (memory bottleneck)

Solution : implicit representationmaintain:

classification: f(x) = wTd �(x) O(d) vs O(n)

Page 24: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Solving the optimization efficiently

c(w) =

2

wTDTd Ddw +

1

n

X

k

max

�0, 1� y

k�wT�(x

k)

��

Original problem : non-standard regularization

c(w) =

2

wTw +

1

n

X

k

max

�0, 1� y

k�wTD�T

d �(x

k)

��

Modified problem : dense features (memory bottleneck)

update:

Solution : implicit representationmaintain:

classification: f(x) = wTd �(x) O(d) vs O(n)

Page 25: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Solving the optimization efficientlyComputing the updates

maintain:

classification: f(x) = wTd �(x)

update:

Page 26: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Solving the optimization efficiently

Given n linearly spaced basis of degree d, computing updates to wd takes O(n2) time.

Computing the updatesmaintain:

classification: f(x) = wTd �(x)

update:

Page 27: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Solving the optimization efficiently

Given n linearly spaced basis of degree d, computing updates to wd takes O(n2) time.

Computing the updatesmaintain:

classification: f(x) = wTd �(x)

update:

However, one can compute it on O(nd) by exploiting the structure of D (see paper below)

S.  Maji,  Linearized  Smooth  Addi4ve  Classifiers,  ECCV  WS  2012

Page 28: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Solving the optimization efficiently

Given n linearly spaced basis of degree d, computing updates to wd takes O(n2) time.

Computing the updatesmaintain:

classification: f(x) = wTd �(x)

update:

However, one can compute it on O(nd) by exploiting the structure of D (see paper below)

S.  Maji,  Linearized  Smooth  Addi4ve  Classifiers,  ECCV  WS  2012

Hence these classifiers can be trained with very low memory overhead, without compromising training time

Page 29: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : computational tradeoffs

spline degree

regu

lari

zatio

nor

der

D0

D1

D2

linear quadratic cubic

Page 30: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : computational tradeoffs

spline degree

regu

lari

zatio

nor

der

D0

D1

D2

linear quadratic cubic

Standard linear solver

Test time increases linearlyAccuracy increases

Training time increases linearly

5.96s, 90.23% 7.26s, 90.34% 10.08s, 90.39%

Experiments on DC pedestrian dataset ( n = 20)IKSVM training time : 360s

Page 31: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : computational tradeoffs

spline degree

regu

lari

zatio

nor

der

D0

D1

D2

linear quadratic cubic

Custom solver for order > 1

Test time constant

Accuracy peaks at D1This suggests that first order smoothness is sufficient

Training time increases linearly

5.96s 90.23%

32.43s91.20%

246.87s89.06%

Experiments on DC pedestrian dataset ( n = 20)IKSVM training time : 360s

Page 32: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : computational tradeoffs

spline degree

regu

lari

zatio

nor

der

D0

D1

D2

linear quadratic cubic

Useful regime for most datasets

Page 33: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : computational tradeoffs

spline degree

regu

lari

zatio

nor

der

D0

D1

D2

linear quadratic cubic

Useful regime for most datasets

most accurateclosely approximates IKSVM

fastest

Page 34: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Spline embeddings : computational tradeoffs

spline degree

regu

lari

zatio

nor

der

D0

D1

D2

linear quadratic cubic

Useful regime for most datasets

most accurateclosely approximates IKSVM

fastest

Experiments on MNIST, INRIA pedestrians, Caltech 101, DC pedestrians, etcAll these are still an order of magnitude faster than traditional SVM solver. These have the same memory overhead as linear SVMs since the features are computed online.

Page 35: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

General basis expansion

f(x1, x2, . . . , xn) = f1(x1) + f2(x2) + . . .+ fn(xn)X

i

wi�i

local splines are the basis

In general can choose any orthonormal basis

Page 36: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Fourier embeddings : representation

Examples: Trigonometric functions, Wavelets, etc.

f(x) =X

wi i(x)

1(x), 1(x), . . . , n(x)

An orthonormal basis

Z b

a i(x) j(x)�(x)dx = �i,j

Representation

Page 37: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Fourier embeddings : regularization

f(x) =X

wi i(x)

1(x), 1(x), . . . , n(x)

An orthonormal basis

Z b

a i(x) j(x)�(x)dx = �i,j

Representation

R(f) = wTHw

Regularization : penalize d’th order derivative

Hi,j =

Z b

a

di (x)

dj (x)w(x)dx

Z b

af

d(x)2�(x)dx =

Z b

awiwj

di (x)

dj (x)�(x)dx

Similar to the spline case (different basis)Requires the basis to be differentiable

Page 38: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Fourier embeddings : optimization

f(x) =X

wi i(x)

1(x), 1(x), . . . , n(x)

An orthonormal basis

Regularization : penalize d’th order derivative

Hi,j =

Z b

a

di (x)

dj (x)w(x)dx

Optimization

c(w) =

2

wTHw +

1

n

X

k

max

�0, 1� y

k�wT (x

k)

��

R(f) = wTHw

Z b

af

d(x)2�(x)dx =

Z b

awiwj

di (x)

dj (x)�(x)dx

Z b

a i(x) j(x)�(x)dx = �i,j

Representation

Page 39: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Fourier embeddings : optimization

Optimization

c(w) =

2

wTHw +

1

n

X

k

max

�0, 1� y

k�wT (x

k)

��

H is not diagonal - cannot directly use fast linear solversIn general it is not structured either (unlike splines)

Page 40: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Fourier embeddings : optimization

Optimization

c(w) =

2

wTHw +

1

n

X

k

max

�0, 1� y

k�wT (x

k)

��

H is not diagonal - cannot directly use fast linear solversIn general it is not structured either (unlike splines)

Practical solutionPick orthogonal basis with orthogonal derivatives

Cross terms disappear, i.e., H is diagonal againZ b

af

d(x)2�(x)dx =

Z b

awiwj

di (x)

dj (x)�(x)dx / w

2i

Page 41: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Fourier embeddings : optimization

Optimization

c(w) =

2

wTHw +

1

n

X

k

max

�0, 1� y

k�wT (x

k)

��

H is not diagonal - cannot directly use fast linear solversIn general it is not structured either (unlike splines)

Practical solutionPick orthogonal basis with orthogonal derivatives

Cross terms disappear, i.e., H is diagonal againZ b

af

d(x)2�(x)dx =

Z b

awiwj

di (x)

dj (x)�(x)dx / w

2i

Examples: Trigonometric functions, One of Jacobi, Laguerre or Hermite polynomials

M.  Webster,  Orthogonal  polynomials  with  orthogonal  derivaFves.  Mathema'sche  Zeitschri.,  39:634–638,  1935

Page 42: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Fourier Embeddings : Two practical ones

Two families of orthogonal basis with orthogonal derivatives

Embeddings that penalize the first and second order derivatives

Learning : project data onto the first few basis and use a linear solver such as LIBLINEAR

Fourier features are low dimensional and dense, as opposed to spline features which are high dimensional but sparse.

Page 43: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Comparison of various additive classifiers

DC pedestrian dataset

Page 44: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Comparison of various additive classifiers

MNIST dataset

Page 45: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Comparison of various additive classifiers

Page 46: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Software

• Code to train large scale additive classifiers. Provides functions:

• train : input (y,x) and outputs additive classifiers

• choice of various encodings, regularizations.

• encodings are computed online

• implements efficient weight updates for splines features

• classify : takes a learned classifier and features, outputs decision values

• encode : returns encoded features which can be directly used with any linear solver

• Download at:

• http://ttic.uchicago.edu/~smaji/libspline-release1.0.tar.gz

Page 47: Workshop(on(Web,scale(Vision(and(Social(Media …ttic.uchicago.edu/~smaji/presentations/libspline-eccv... · 2014-05-01 · Workshop(on(Web,scale(Vision(and(Social(Media ECCV(2012,(Firenze,(Italy

Conclusions

• We discussed methods to directly learn additive classifiers based on a regularized loss minimization

• We proposed two kinds of basis for which the learning problem can be efficiently solved, Spline and Fourier embeddings

• Spline embeddings are sparse, easy to compute, and can be used to learn classifiers with almost no memory overhead, compared to learning linear classifiers.

• Fourier embeddings (Trigonometric and Hermite) are low dimensional, but are relatively expensive to compute, hence are useful in setting where features can be stored in memory

• More experimental details and code can be found on the author’s website (ttic.uchicago.edu/~smaji)