SPG-GMKL: Generalized Multiple Kernel Learningashesh/pubs/posters/jainkdd2012.pdf · SPG-GMKL:...

SPG-GMKL: Generalized Multiple Kernel Learning

with a Million Kernels

For Images,

it should be at least 150dpi,

i.e. for 35cm width image,

The width is about 2067 pixels

Ashesh Jain

IIT Delhi asheshjain399@gmail.com

S. V. N. Vishwanathan

Purdue University vishy@stat.purdue.edu

Manik Varma

Microsoft Research India manik@microsoft.com

Paper ID: rt540

1. Kernel Learning

•Jointly learning both SVM and kernel parameters

from training data

• Kernel parameterizations

• Linear : 𝐾 = 𝑑𝑖𝐾𝑖𝑖

• Non-linear : 𝐾 = 𝐾𝑖 = 𝑒−𝑑𝑖𝐷𝑖𝑖𝑖

• Regularizers

• Sparse l1

• Sparse and non-sparse lp>1

2. GMKL Formulation for binary classification

P = Minw,b,d, 𝜉 ½𝐰𝑡𝐰+ 𝐶 𝜉𝑖 + 𝑟(𝐝)𝑖

s. t. 𝑦𝑖 𝐰𝑡𝛟𝐝 𝐱𝑖 + 𝑏 ≥ 1 − 𝜉𝑖 𝜉𝑖 ≥ 0 & 𝐝 ∈ 𝑫 Intermediate Dual:

D = MindMax 1t – ½tYKdY + r(d)

s. t. 1tY = 0

0 C & 𝐝 ∈ 𝑫 • Can be extended to other loss functions and regularizers

3. Wrapper method for optimizing GMKL

Outer loop: Mind W(d) s. t. 𝐝 ∈ 𝑫

Inner loop: W(d) = Max1t – ½tYKdY + r(d)

s. t. 1tY = 0 & 0 C

• 𝛁𝐝𝑊 = – ½𝛂∗tY dKY𝛂∗ + dr

• ∗ can be obtained using a standard SVM solver

• Optimized using Projected Gradient Descent (PGD)

4. PGD limitations

• Requires many function and gradient evaluations as

• No step size information is available

• Armijo rule might reject many step size proposals

• Noisy gradient can lead to many tiny steps

• Solving SVMs to high precision to obtain accurate function

and gradient values is very expensive

• Repeated projection onto the feasible set can be expensive

5. SPG Solution

SPG is obtained by adding four components to PGD

1. Spectral Step Length

𝜆𝑆𝑃𝐺 =𝐝𝑛 − 𝐝𝑛−𝟏, 𝐝𝑛 − 𝐝𝑛−𝟏

𝐝𝑛 − 𝐝𝑛−𝟏, 𝛁𝑊(𝐝𝑛) − 𝛁𝑊(𝐝𝑛−𝟏)

2. Non-monotone line search

𝑊 𝐝𝑛 − 𝑠𝛁𝑊 𝐝𝑛 ≤ max0≤𝑗≤𝑀

𝑊 𝐝𝑛−𝑗 − 𝛾𝑠 𝛁𝑊 𝐝𝑛22

3. Inner SVM precision tuning

• Quicker steps when away from minima

4. Minimizing the number of projections

6. SPG Algorithm

1: 𝑛 ← 0; Initialize 𝐝0 randomly

2: 𝐫𝐞𝐩𝐞𝐚𝐭 3: 𝛂∗ ← SolveSVMϵ(𝐊(𝐝

𝑛)) 4: 𝜆 ← SpectralStepLength

5: 𝐩𝑛 ← 𝐝𝑛 − 𝐏 𝐝𝑛 − 𝛌𝛁𝑊 𝐝𝑛, 𝛂∗

6: s𝑛 ← Non −Monotone

7: ϵ ← TuneSVMPrecision

8: 𝐝𝑛+𝟏 ← 𝐝𝑛 − s𝑛𝐩𝑛

9: 𝐮𝐧𝐭𝐢𝐥 converged

• Handling function and gradient noise

8. Non-Monotone Rule

0 10 20 30 40 50 601426

time (s)

Global Minimum

SPG-GMKL

7. Spectral Step Length

• Quadratic approximation : ½𝜆𝐱𝑡𝐱 + 𝑡𝐱 + 𝑑

Original Function Approximation 00.1

0.20.3

0.40.5

0.60.7

0.80.9

0.20.3

0.40.5

0.60.7

0.80.9

9. SVM Precision Tuning

0.5 hr

0.2 hr

0.3 hr

0.1 hr

10. Results on Large Scale Data Sets

• Sum of kernels subject to 𝑙𝑝≥1 regularization

• Product of kernels subject to 𝑙𝑝≥1 regularization

Data Sets # Train # KernelsPGD (hrs) SPG (hrs) PGD (hrs) SPG (hrs)

Letter 20,000 16 18.66 0.67 18.69 0.66

Poker 25,010 10 5.57 0.49 2.29 0.96

Adult - 8 22,696 42 – 1.73 – 3.42

Web - 7 24,692 43 – 0.88 – 1.33

RCV1 20,242 50 – 18.17 – 15.93

Cod - RNA 59,535 8 – 3.45 – 8.99

Data Sets SimpleMKL (s) Shogun (s) PGD (s) SPG (s)

Wpbc 400 356.4 15 7.7 38 17.6 6 4.2

Breast - Cancer 676 356.4 12 1.2 57 85.1 5 0.6

Australian 383 33.5 1094 621.6 29 7.1 10 0.8

Ionosphere 1247 680.0 107 18.8 1392 824.2 39 6.8

Sonar 1468 1252.7 935 65.0 – 273 64.0

Data Sets # Train # KernelsPGD (hrs) SPG (hrs) PGD (hrs) SPG (hrs)

Adult - 9 32,561 50 35.84 4.55 31.77 4.42

Cod - RNA 59,535 50 – 25.17 66.48 19.10

KDDCup04 50,000 50 – 40.10 – 42.20

Sonar 208 1 Million – 105.62

Covertype 581,012 5 – 64.46

12. Results on Small Scale Data Sets

• Sum of kernels subject to 𝑙1 regularization

11. SPG Scaling

Properties

• Scaling with the number of training points

3.2 3.4 3.6 3.8 4 4.2 4.4 4.6 4.81.5

log(#Training Points)

SPG p=1.0

SPG p=1.33

SPG p=1.66

PGD p=1.0

PGD p=1.33

PGD p=1.66

• Scaling with the number of kernels

3.6 3.8 4 4.2 4.4 4.6 4.8 5 5.22

log(#Kernel)

SPG p=1.33

SPG p=1.66

PGD p=1.33

PGD p=1.66

13. SPG Contributions

• SPG can handle any regularization

and kernel parameterization

• SPG is more robust to noisy function

and gradient values

• SPG requires fewer function and

gradient evaluations

SPG-GMKL: Generalized Multiple Kernel Learningashesh/pubs/posters/jainkdd2012.pdf · SPG-GMKL:...

Documents

The kernel of the generalized Cli ord-Fourier transform and its … · 1 The kernel of the generalized Cli ord-Fourier transform and its 2 generating function Pan Lian 1;2 Gejun Bao

SPG-FE-FX-IDFC SPG-FE-FX-CDFC - Source Photonics

AN EMBEDDED SYSTEMS KERNEL · 2002. 4. 19. · Embedded systems kernel development and implementation, single address space operating systems, generalized bootstrapping. i Contents

2012 - 2015 Strategic Plan Launch · Spg 00 Spg 01 Spg 02 Spg 03 Spg 04 Spg 05 Spg 06 Spg 07 Spg 08 Spg 09 Spg 10 Full-Time Total Part-Time Number of Full and Part-Time Students Enrolled

SPG CATALOGO 0511

SPG BINDER SPEC

Kerala State Welfare Corporation for Forward Communities ...125 2019/spg/00000193 sara mary thomas 126 2019/spg/00000194 athira s krishnan 127 2019/spg/00000195 athulya v 128 2019/spg/00000196

Local Polynomial Kernel Regression for Generalized Linear ... · Local Polynomial Kernel Regression for Generalized Linear Models and ... of the analysis of parametric generalized

Unsupervised Kernel Regressionrfm/pubs/ukr_talk.pdf · 2007. 7. 19. · Unsupervised Kernel Regression Unsupervised Learning as Generalized Regression y= f(x; ) + u; E(u) = 0 Now

Bayesian Generalized Kernel Mixed Modelsjmlr.csail.mit.edu/papers/volume12/zhang11a/zhang11a.pdf · 2020-04-29 · BAYESIAN GENERALIZED KERNEL MIXED MODELS In geostatistics, GPs have

File Classification Using Sub- Sequence Kernelsold.dfrws.org/2003/presentations/Brief-deVel.pdf · Semi-structured documents Broadband k-spectrum kernel Coloured generalized suffix

20th SPG Foundation Day Lectures at Dehradun on August 15 ... · 20th SPG Foundation Day Lectures at Dehradun on August 15, 2011 SPG-India organized 20th SPG foundation day lecture

Folleto Spg

Local Polynomial Kernel Regression for Generalized Linear Models

Advanced Size Optimization of the Linux Kernel · 6 2013-04-12 PA1 Bloat trajectory • Software gets more generalized over time • Kernel growing by 10% per year for last 10 years

Quasi Riesz transforms, Hardy spaces and generalized sub-Gaussian heat kernel estimates

Regularized Discriminant Analysis, Ridge Regression and Beyondjordan/papers/zhang-etal-jmlr10.pdf · Keywords: Fisher discriminant analysis, reproducing kernel, generalized eigenproblems,

2013 4Q SPG Landlord Rep ZH - data.immoserver.chdata.immoserver.ch/file/191495/2013 SPG Landlord Rep update Dec... · XL Services Switzerland Ltd ... SPG Intercity Lausanne Place

Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems

Generalized holomorphic Szeg o kernel in 3D spheroids · 2013-02-25 · Generalized holomorphic Szeg o kernel in 3D spheroids J. Morais Center for Research and Development in Mathematics