69
Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Linear Algebra in and for Least Squares Support Vector Machines [email protected] www.bartdemoor.be Katholieke Universiteit Leuven Department of Electrical Engineering ESAT-STADIUS 1 / 69

Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Linear Algebra in and forLeast Squares Support Vector Machines

[email protected]

Katholieke Universiteit LeuvenDepartment of Electrical Engineering

ESAT-STADIUS

1 / 69

Page 2: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Outline

1 Big Data

2 Low is difficult, high is easy

3 Regression

4 Classification

5 Dimensionality reduction

6 Correlation analysis

7 Extensions

8 Applications

9 Conclusions

2 / 69

Page 3: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Outline

1 Big Data

2 Low is difficult, high is easy

3 Regression

4 Classification

5 Dimensionality reduction

6 Correlation analysis

7 Extensions

8 Applications

9 Conclusions

3 / 69

Page 4: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Wishlist

Big Data: Volume, Velocity, Variety, . . .

Dealing with high dimensional input spaces

Need for powerful black box modeling techniques

Avoid pitfalls of nonlinear optimization (convergenceissues, local minima,. . . )

Preferable: (numerical) linear algebra, convex optimization

Algorithms for (un-)supervised function regression andestimation, (predictive) modeling, clustering andclassification, data dimensionality reduction, correlationanalysis (spatial-temporal modeling), feature selection,(early - intermediate - late) data fusion, ranking, outlierand fraud detection, decision support systems (processindustry 4.0, digital health, . . . )

. . .

4 / 69

Page 5: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Wishlist

Linear Algebra LS-SVM/KernelSupervised Least squares Function Estimation

Classification Kernel ClassificationUnsupervised SVD - PCA Kernel PCA

Angles - CCA Kernel CCA

5 / 69

Page 6: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Outline

1 Big Data

2 Low is difficult, high is easy

3 Regression

4 Classification

5 Dimensionality reduction

6 Correlation analysis

7 Extensions

8 Applications

9 Conclusions

6 / 69

Page 7: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

System Identification

System Identification: PEM

LTI models

Non-convex optimization

Considered ’solved’ early nineties

Linear Algebra approach

⇒ Large block Hankel data matrices;SVD; Orthogonal and oblique projections;Least squares⇒ Subspace methods

7 / 69

Page 8: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Polynomial optimization problems

Multivariate polynomial optimizationproblems

Multivariate polynomial object function +constraints

Non-convex optimization

Computer Algebra, Homotopy methods,Numerical Optimization

Linear Algebra approach

⇒ Macaulay matrix; SVD; Kernel;Realization theory⇒ Smallest eigenvalue of large matrix

8 / 69

Page 9: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least Squares Support Vector Machines. . .

Nonlinear regression, modelling and clustering

Most regression, modelling and clusteringproblems are nonlinear when formulated in theinput data space

This requires nonlinear nonconvex optimizationalgorithms

Linear Algebra approach

⇒ Least Squares Support Vector Machines

‘Kernel trick’ = projection of input data to ahigh-dimensional feature space

Regression, modelling, clustering problembecomes a large scale linear algebra problem (setof linear equations, eigenvalue problem)

9 / 69

Page 10: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least Squares Support Vector Machines. . .

10 / 69

Page 11: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least Squares Support Vector Machines. . .

Linear Algebra LS-SVM/KernelSupervised Least squares Function Estimation

Classification Kernel ClassificationUnsupervised SVD - PCA Kernel PCA

Angles - CCA Kernel CCA

11 / 69

Page 12: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Outline

1 Big Data

2 Low is difficult, high is easy

3 Regression

4 Classification

5 Dimensionality reduction

6 Correlation analysis

7 Extensions

8 Applications

9 Conclusions

12 / 69

Page 13: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least squares

X.w = y

Consistent ⇐⇒ rank(X) = rank(X y)

Then estimate w unique iff X of full column rank

Inconsistent if rank(X y) = rank(X) + 1

Find ‘best’ linear combination of columns of X toapproximate y =⇒ Least Squares

13 / 69

Page 14: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least squares

min w ‖y −Xw‖22 =min w wTXTXw − 2yTXw + yTy

Derivatives w.r.t. w =⇒ normal equations

XTXw = XTy

If X full column rank: w = (XTX)−1XTy

14 / 69

Page 15: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least squares

Equivalently: Call e = y −Xw

min e,w ‖e‖22 = eTe subject to y −Xw − e = 0

Lagrangean L(e, w, l) = 12eTe+ lT (y −Xw − e)

∂L∂e

= 0 = e− l∂L∂w

= 0 = XT l

∂L∂l

= 0 = y −Xw − e

=⇒ e = l

=⇒ XTe = 0

=⇒ XTXw = XTy

15 / 69

Page 16: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least squares

Consider

mine,w

1

2eTV −1e+

1

2wTW−1w

subject toy = Xw + e

This is maximum likelihood/Bayesian with priors

e ∼ N (0, V ) and w ∼ N (0,W ) .

Lagrangean

L(w, l, e) = 1

2eTV −1e+

1

2wTW−1w−lT (y−Xw−e)

16 / 69

Page 17: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least squares

∂L∂w

= 0 = W−1w +XT l

∂L∂l

= 0 = y −Xw − e∂L∂e

= 0 = V −1e+ l

Hence

0 X IXT W−1 0I 0 V −1

lwe

=

y00

Karush-Kuhn-Tucker equations

17 / 69

Page 18: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least squares

Let X ∈ Rp×q. Eliminate e = y −Xw

V −1y − V −1Xw + l = 0

W−1w +XT l = 0

Primal: Eliminate lx = (XTV −1X +W−1)−1XTV −1y→ ‘small’ q × q inverse

Dual: Eliminate wl = −(V +XWXT )−1y→ ‘large’ p× p inverse

18 / 69

Page 19: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least Squares Support Vector Machine Regression

Given X ∈ RN×q, y ∈ RN with i-th row xi.Consider a nonlinear vector function ϕ(xi) ∈ R1×q

and the constrained least squares optimizationproblem:

minw,b,e

1

2wTw +

γ

2eTe

subject to

y =

ϕ(x1)...

ϕ(xN)

w + e = Xϕw + e

19 / 69

Page 20: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least Squares Support Vector Machine Regression

Lagrangean

L(w, e, l) = 1

2wTw +

γ

2eTe+ lT (y −Xϕw − e)

Eliminate e and w to find

l = (XϕXTϕ +

1

γIN)

−1y

20 / 69

Page 21: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least Squares Support Vector Machine Regression

Call (XϕXTϕ + 1

γIN) the kernel K(., .) with element

i, j: K(xi, xj) = ϕ(xi).ϕT (xj), an ‘inner product’.

Then, obviously

y(x) =∑N

i=1K(xi, x)li

21 / 69

Page 22: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least Squares Support Vector Machine Regression

22 / 69

Page 23: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least Squares Support Vector Machine Regression

Observations:

Kernel: N ×N , symmetric positive definite matrix

q can be large (possibly ∞)

ϕ(xi) nonlinearly maps data row xi into a high-(possibly∞-)dimensional space.

Not needed to know ϕ(.) explicitly. In machine learning, wefix a symmetric continuous kernel that satisfies Mercer’scondition: ∫

K(x, z)g(x)g(z)dxdz ≥ 0 ,

for any square integrable function g(x). Then K(x, z)separates: ∃ Hilbert space H, ∃ map φ(.) and ∃ λi > 0 suchthat

K(x, z) =∑

λiφ(x)φ(z) .

Kernel trick: Work out ‘dual formulation’ with Lagrangemultipliers; generate ‘long’ (∞) inner products with ϕ(.);

23 / 69

Page 24: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least Squares Support Vector Machine Regression

Kernels:

Mathematical form: linear, polynomial, radial basis function,splines, wavelets, string kernel, kernels from graphicalmodels, Fisher kernels, graph kernels, data fusionkernels, spike kernels, . . .

Application inspired: Text mining, bioinformatics, images, . . .

24 / 69

Page 25: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Least Squares Support Vector Machine Regression

Linear Algebra LS-SVM/KernelSupervised Least squares Function Estimation

Classification Kernel ClassificationUnsupervised SVD - PCA Kernel PCA

Angles - CCA Kernel CCA

25 / 69

Page 26: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Outline

1 Big Data

2 Low is difficult, high is easy

3 Regression

4 Classification

5 Dimensionality reduction

6 Correlation analysis

7 Extensions

8 Applications

9 Conclusions

26 / 69

Page 27: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

27 / 69

Page 28: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Linear classification

28 / 69

Page 29: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Linear classification

29 / 69

Page 30: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

LS-SVM classifier

30 / 69

Page 31: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

LS-SVM classifier

31 / 69

Page 32: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Outline

1 Big Data

2 Low is difficult, high is easy

3 Regression

4 Classification

5 Dimensionality reduction

6 Correlation analysis

7 Extensions

8 Applications

9 Conclusions

32 / 69

Page 33: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Linear Algebra LS-SVM/KernelSupervised Least squares Function Estimation

Classification Kernel ClassificationUnsupervised SVD - PCA Kernel PCA

Angles - CCA Kernel CCA

33 / 69

Page 34: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

PCA - SVD

Given data matrix X. Find vectors of maximum variance:

maxw,e

eT e ,

subject toe = Xw , wTw = 1 .

Lagrangean:

L(w, e, λ) = 1

2eT e− lT (e−Xw) + λ(1− wTw)

giving

0 = e− l0 = XT l − 2wλ

0 = e−Xw1 = wTw

34 / 69

Page 35: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

PCA - SVD

Hencel = Xw , XT l = w(2λ) , lT l = 2λ .

Call v = l/√2λ and σ =

√2λ, then

Xw = vσ , wTw = 1

XT v = wσ , vT v = 1

SVD !! So, the left singular vectors of X are Lagrange multipliersin a PCA problem.

Example: 12 600 genes72 patients:

- 28 Acute Lymphoblastic Leukemia (ALL)

- 24 Acute Myeloid Leukemia (AML)

- 20 Mixed Linkage Leukemia (MLL)

35 / 69

Page 36: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

PCA - SVD

36 / 69

Page 37: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

PCA - SVD

37 / 69

Page 38: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Kernel PCA

38 / 69

Page 39: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Kernel PCA

39 / 69

Page 40: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Outline

1 Big Data

2 Low is difficult, high is easy

3 Regression

4 Classification

5 Dimensionality reduction

6 Correlation analysis

7 Extensions

8 Applications

9 Conclusions

40 / 69

Page 41: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Linear Algebra LS-SVM/KernelSupervised Least squares Function Estimation

Classification Kernel ClassificationUnsupervised SVD - PCA Kernel PCA

Angles - CCA Kernel CCA

41 / 69

Page 42: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Principal angles and directions. . .

Given two data matrices X,Y . Find directions in the column spaceof X, resp. Y that maximally correlate:

mine,f,v,w

1

2‖e− f‖22 ,

subject toe = Xv, f = Y w, eT e = 1, fT f = 1 .

Notice that

‖e− f‖22 = 1 + 1− 2eT f = 2(1− cos θ)

Minimizing distance = maximizing cosine = minimizing anglebetween column spaces

42 / 69

Page 43: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Principal angles and directions. . .

Lagrangean:L(e, f, v, w, a, b, α, β) = 1− eT f + aT (e−Xv) + bT (f − Y w)

−α(1− eT e)− β(1− fT f)resulting in

−f + a = −eα e = Xv

−e+ b = −fβ f = Y w

XTa = 0 eT e = 1

Y T b = 0 fT f = 1

Eliminating a, b, e, f gives

XTY w = XTXvα

Y TXv = Y TY wβ

Hence: α = β = λ(say)(= cos θ) .

43 / 69

Page 44: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Principal angles and directions. . .

Principal angles and directions follow from Generalized EVP(0 XTY

Y TX 0

)(vw

)=

(XTX 00 Y TY

)(vw

vTXTXv = wTY TY w = 1

Numerically correct way: use 3 SVD’s (see Golub/VanLoan)

44 / 69

Page 45: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Kernel CCA

45 / 69

Page 46: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Kernel CCA

46 / 69

Page 47: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Kernel CCA

47 / 69

Page 48: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Outline

1 Big Data

2 Low is difficult, high is easy

3 Regression

4 Classification

5 Dimensionality reduction

6 Correlation analysis

7 Extensions

8 Applications

9 Conclusions

48 / 69

Page 49: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

‘Core’ LS-SVM problems

49 / 69

Page 50: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Software

http://www.esat.kuleuven.be/sista/lssvmlab/

50 / 69

Page 51: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Enforcing sparsity

51 / 69

Page 52: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Robustness

52 / 69

Page 53: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Robustness

53 / 69

Page 54: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Fixed-size LS-SVM

54 / 69

Page 55: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Prediction intervals

55 / 69

Page 56: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Semi-supervised learning

56 / 69

Page 57: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Tensor data

57 / 69

Page 58: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Outline

1 Big Data

2 Low is difficult, high is easy

3 Regression

4 Classification

5 Dimensionality reduction

6 Correlation analysis

7 Extensions

8 Applications

9 Conclusions

58 / 69

Page 59: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Modeling a tsunami of data

59 / 69

Page 60: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Electric Load Forecasting

60 / 69

Page 61: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Electric Load Forecasting

61 / 69

Page 62: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Electric Load Forecasting

62 / 69

Page 63: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Medical applications

PACS UZ Leuven

1,6 PetaByte

Genomics core HiSeq 2000 full speed exome sequencing

1 TeraByte / week

1 small animal image

1 GigaByte

1 CD-ROM 750

MegaByte

sequencing all newborns by 2020 (125k births /

year) 125 PetaByte / year

index of 20 million

Biomedical PubMed records

23 GigaByte

1 slice mouse brain MSI at

10 μm resolution 81 GigaByte

raw NGS data of 1 full genome

1 TeraByte

1 kB = 1000 1 MB = 1 000 000 1 GB = 1 000 000 000 1 TB = 1 000 000 000 000 1 PB = 1 000 000 000 000 000

63 / 69

Page 64: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Magnetic resonance spectroscopic imaging

64 / 69

Page 65: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Proteomics

65 / 69

Page 66: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Ranking from data fusion

66 / 69

Page 67: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Outline

1 Big Data

2 Low is difficult, high is easy

3 Regression

4 Classification

5 Dimensionality reduction

6 Correlation analysis

7 Extensions

8 Applications

9 Conclusions

67 / 69

Page 68: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Conclusions

LS-SVM = unifying framework for (un-)supervised ML tasks:regression, (predictive) modeling, clustering and classification,data dimensionality reduction, correlation analysis(spatial-temporal modeling), feature selection, (early -intermediate - late) data fusion, ranking, outlier detection

Form a core ingredient of decision support systems with‘human decision maker in-the-loop’: Policies in climate,energy, pollution; Clinical decision support: digital health;Industrial decision support: yield, monitoring, emissioncontrol; Zillions of application areas;

Tsunami of Big Data (high dimensional input spaces, highcomplexity and interrelations, ...) are generated byInternet-of-Things multi-sensor networks, clinical monitoringequipment, etc...

Via the Kernel Trick: It’s all linear algebra !

68 / 69

Page 69: Linear Algebra in and for Least Squares Support Vector ...bdmdotbe/bdm2013/... · Big Data Low is di cult, high is easy Regression Classi cation Dimensionality reduction Correlation

Big Data Low is difficult, high is easy Regression Classification Dimensionality reduction Correlation analysis Extensions Applications Conclusions

Conclusions

STADIUS - SPIN-OFFS “Going beyond research”�www. esat.kuleuven.be/stadius/spinoffs.php�

���������������� ������������������

Automation & Optimization��

Transport & �Mobility research �

��

Financial compliance�

Data mining� industry solutions�

����������������

Automated PCR analysis�������������� ��

Data handling & mining for clinical genetics����������������� ��

���������������

����������������� ��

�����������������������������

�������������������������������

�������� �����

69 / 69