71
11. 10. 2001. NIMIA Crema, Italy 1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

Embed Size (px)

Citation preview

Page 1: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 1

Identification and Neural Networks

I S R G

G. Horváth

Department of Measurement and Information Systems

Page 2: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 2

Modular networksWhy modular approach

Motivations Biological

Learning

Computational

Implementation

Page 3: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 3

Motivations Biological

Biological systems are not homogenous

Functional specialization

Fault tolerance

Cooperation, competition

Scalability

Extendibility

Page 4: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 4

Motivations Complexity of learning (divide and conquer)

Training of complex network (many layers)

layer by layer learning

Speed of learning

Catastrophic interference, incremental learning

Mixing supervised and unsupervised learning

Hierarchical knowledge structure

Page 5: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 5

Motivations Computational

The capacity of a network

The size of the network

Catastrophic interference

Generalization capability vs network complexity

Page 6: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 6

Motivations Implementation (hardware)

The degree of parallelism

Number of connections

The length of physical connections

Fan out

Page 7: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 7

Modular networksWhat modules

The modules are disagree on some inputs every module solves the same, whole problem,

different ways of solutions (different modules)

every module solves different tasks (sub-tasks) task decomposition (input space, output space)

Page 8: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 8

Modular networksHow combine modules

Cooperative modules simple average weighted average (fixed weights)

optimal linear combination (OLC) of networks

Competitive modules majority vote winner takes all

Competitive/cooperative modules weighted average (input-dependent weights)

mixture of experts (MOE)

Page 9: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 9

Modular networks Construct of modular networks

Task decomposition, subtask definition

Training modules for solving subtasks

Integration of the results

(cooperation and/or competition)

Page 10: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 11

Cooperative networksEnsemble of cooperating networks

(classification/regression)

The motivation Heuristic explanation

Different experts together can solve a problem better

Complementary knowledge

Mathematical justification Accurate and diverse modules

Page 11: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 12

Ensemble of networks Mathematical justification

Ensemble output

Ambiguity (diversity)

Individual error

Ensemble error

Constraint

xx j

M

jj y

0

,y

2,y)((x) xxd

2)( xxx jj yd

2,)( xxx yya jj

1

jj

Page 12: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 13

Ensemble of networks Mathematical justification (cont’d)

Weighted error

Weighted diversity Ensemble error

Averaging over the input distribution

Solution: Ensemble of accurate and diverse networks

xx j

M

jjaa

0

,

xx j

M

jj

0

,

2,y)((x) xxd ,),( xx a

x

xxx dfE )(),( x

xxx dfE )(),( x

xxx dfaA )(),(

AEE

Page 13: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 14

Ensemble of networks How to get accurate and diverse networks

different structures: more than one network structure (e.g. MLP, RBF, CCN, etc.)

different size, different complexity networks (number of hidden units, number of layers, nonlinear function, etc.)

different learning strategies (BP, CG, random search,etc.) batch learning, sequential learning

different training algorithms, sample order, learning samples

different training parameters

different starting parameter values

different stopping criteria

Page 14: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 15

Linear combination of networks

NNM

NN1

NN2

α1

α2

αM

Σ

y1

y2

yM

x

NNM

α 0

y0=1

xx j

M

jj y

0

,y

Page 15: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 16

Linear combination of networks

Computation of optimal coefficients        simple average

        , k depends on the input for different input domains different network (alone gives the output)

optimal values using the constraint

optimal values without any constraint

  Wiener-Hopf equation

MkMk ...1,1

kjjk ,0,1

PR 1*1

y

Ty xyxyR E

1

k

xxyP dE

Page 16: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 17

Task decomposition Decomposition related to learning

before learning (subtask definition)

during learning (automatic task decomposition)

Problem space decomposition input space (input space clustering, definition of

different input regions)

output space (desired response)

Page 17: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 18

Task decomposition Decomposition into separate subproblems

K-class classification K two-class problems (coarse decomposition)

Complex two-class problems smaller two-class problems (fine decomposition)

Integration (module combination)

Page 18: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 19

Task decomposition A 3-class problem

Page 19: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 20

Task decomposition 3 classes

2 small classes 2 small classes

Page 20: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 21

Task decomposition 3 classes

2 classes

2 small classes2 small classes

Page 21: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 22

Task decomposition 3 classes

2 small classes2 small classes

Page 22: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 23

Task decomposition

M12

M13

M23

MIN

MIN

MIN

C1

C2

C3

INV=

Input

Page 23: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 24

Task decompositionA two-class problem

decomposed into

subtasks

Page 24: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 25

Task decomposition

AND

OR

AND

M11 M12

M22M21

Page 25: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 26

Task decomposition

M11

M21

MIN

MAX

MIN

C1

Input

M12

M22

Page 26: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 27

Task decomposition Training set decomposition:

Original training set

Training set for each of the (K) two-class problems

Each of the two-class problems are divided into K-1 smaller two-class problems [using an inverter module really (K-1)/2 is enough]

L

lll yΤ 1, x

KiyΤL

li

lli ...1, 1)( x

iC

iCy

il

ilil exceptclasses allif

classif1)(

x

x

jiKjiΤ j

l

i

l

L

l

iL

l

iij

and...1,,1,

1

)(

1

)( xx

Page 27: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 28

Task decomposition

input number 16 x 16

NormalizationEdge detection

horizontal

diagonal \

diagonal /

vertical

Kirsch masks

4 16 x 16 feature maps

4 8 x 8 matrix

input number16 x 16

A practical example: Zip code recognition

Page 28: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 29

Task decomposition Zip code recognition (handwritten character

recognition) modular solution

45 (K*K-1)/2 neurons

10 AND gates (MIN operator)

256+1 inputs

Page 29: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 30

Mixture of Experts (MOE)

Expert 2Expert 1

Gating network

μ1

μ

g1 g2

x

Expert M

gM

Σ

Page 30: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 31

Mixture of Experts (MOE) The output is the weighted sum of the outputs of the

experts

is the parameter of the i-th expert

The output of the gating network: “softmax” function

is the parameter of the gating network

i

M

iig μμ

1

M

j

ij

i

e

eg

1

i iTv x

11

M

iig

1ig i),( ii f xμ

i

Tiv

Page 31: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 32

Mixture of Experts (MOE) Probabilistic interpretation

The probabilistic model with true parameters

a priori probability

i iE [ | , ]y x g P ii i ( | , )x v

i

iii PgP ),|(),(),|( 000 xyvxxy

g P ii i i( , ) ( | , )x v x v0 0

Page 32: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 33

Mixture of Experts (MOE) Training

Training data

Probability of generating output from the input

The log likelihood function (maximum likelihood estimation)

X l l

l

L

x y( ) ( ),

1

P P i Pl l li

l li

i

( | , ) ( | , ) ( | , )( ) ( ) ( ) ( ) ( )y x x v y x

P P P i Pl l

l

Ll

il l

iil

L

( | , ) ( | , ) ( | , ) ( | , )( ) ( ) ( ) ( ) ( )y x y x x v y x

1 1

ii

lli

l

l

PiPL ),|(),|(log),( )()()( xyvxx

Page 33: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 34

Mixture of Experts (MOE) Training (cont’d)

Gradient method

The parameter of the expert network

The parameter of the gating network

and 0v

x

i ),(

0x

i

),(

i i i

l li

l

Li

i

k k h( ) ( ) ( )( ) ( ) 1

1

y

v v xi i il

il

l

Llk k h g( ) ( ) ( ) ( ) ( )

1

1

Page 34: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 35

Mixture of Experts (MOE) Training (cont’d)

A priori probability

A posteriori probability

jj

lllj

illl

ili

Pg

Pgh

),(

),(

xy

xy

),|(),( il

il

ili iPgg vxvx

Page 35: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 36

Mixture of Experts (MOE) Training (cont’d)

EM (Expectation Maximization) algorithm

A general iterative technique for maximum likelihood estimation Introducing hidden variables Defining a log likelihood function

Two steps: Expectation of the hidden variables Maximization of the log likelihood function

Page 36: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 37

EM (Expectation Maximization) algorithm A simple example: estimating means of k (2) Gaussians

f (y│µ1) f (y│2)

Measurements

Page 37: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 38

EM (Expectation Maximization) algorithm A simple example: estimating means of k (2) Gaussians

hidden variables for every observation,

(x(l), zi1, zi2)

likelihood function

Log likelihood function

expected value of with given

)2()()(2

)(1 if1and0 X lll xzz

)1()()(2

)(1 if0and1 X lll xzz

)(

)((),()( )(

1

)()()( liz

il

k

ii

li

li

l xfzxfxf

)((log),(log )(

1

)()()(i

lk

i

lii

li

l xfzzxf

L

)(liz 21 and

2

1

)(

1)(

)(1

)(

)(

jj

l

ll

xxf

xxfzE

2

1

)(

2)(

)(2

)(

)(

jj

l

ll

xxf

xxfzE

Page 38: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 39

Mixture of Experts (MOE) A simple example: estimating means of k (2) Gaussians

Expected log likelihood function

where

The estimate of the means

)((log)(

)()((log][][ )(

12

1

)(

)()(

1

)(i

lk

i

jj

l

il

ilk

i

li xf

xxf

xxfxfzEE

L

]

2

1exp[

2

1)(

2

2

2

)(

i

il x

xxf

2

2)(

2

)(

2

1

2

1log)(log

pi

il x

xf

L

l

li

li zEx

L 1

)()( ][1̂

Page 39: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 40

Mixture of Experts (MOE) Applications

Simple experts: linear experts

ECG diagnostics

Mixture of Kalman filters

Discussion: comparison to non-modular architecture

Page 40: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 41

Support vector machines A new approach:

Gives answers for questions not solved using the classical approach

The size of the network

The generalization capability

Page 41: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 42

Classical neural learning Support Vector Machine

Support vector machines

Optimal hyperplane

Classification

Page 42: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 43

TMwww ,...,, 10w xxx M ,...,, 10

xwx T

j

M

jjw

0

y

VC dimension

Page 43: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 44

guarantedtiongeneralizaguaranted vvVCfvv ,...)(teach

Structural error minimization

Page 44: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 45

Support vector machines Linearly separable two-class problem

separating hyperpalne

Piii y 1),( x 1,1 21 iiii yy XX xx

0bT xw

2

1

if,1

and if,1

Xb

Xb

iiT

iiT

xxw

xxw

iyb iiT ,1)( xw

Optimal hyperplane

Page 45: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 46

Support vector machines

ww

xw

w

xw

xwxww

xx

xx

2minmin

),,(min),,(min),(

}1;{}1;{

}1;{}1;{

bb

bdbdb

iT

y

iT

y

iy

iy

iiii

iiii

w

xwxw

bbd

T ),,(

x

d(x)

w

bd

x1

x2

Geometric interpretation

Page 46: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 47

Support vector machines Criterion function, Lagrange function

a constrained optimization problem

conditions

dual problem

support vectors optimal hyperplane

2

2

1ww

P

iii

Ti ybbJ

1

2}1]{[

2

1),,( xwww

01

ii

P

ii y

Jxw

w

P

iiii y

1xw 0

1

i

P

ii y

b

J

})(2

1{max)(max

11 1

P

ii

P

i

P

jjijiji yyW

xx

0: ii x

01

P

iii y

P

iiii y

1

0 xw

),,(minmax,

bJb

ww

Page 47: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 48

Support vector machines Linearly nonseparable case

separating hyperplane

criterion function

Lagrange function

support vectors optimal hyperplane

Piby iiT

i ,...,11][ xw

P

iiCw

1

2

2

1),( w

P

iiiii

Tii

P

ii byCbJ

11

2}1][{

2

1),,,,( xwww

Ci 0

0: ii x

P

iiii y

1

0 xw

Optimal hyperplane

Page 48: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 49

Support vector machines Nonlinear separation

separating hyperplane

decision surface

kernel function

criterion function

0bT xw

0),(1 01

P

i

M

jjijiii

P

iii yKy xxxx

jiT

jiK xxxx ),(

00

M

jjjw x

P

i

P

i

P

jjijijii KyyW

1 1 12

1xx

Page 49: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 50

Support vector machines Examples of SVM

Polynomial

RBF

MLP

,...1,)1(, dK di

Ti xxxx

2

22

1exp, iiK xxxx

10tanh, iT

iK xxxx

Page 50: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 51

Support vector machines Example: polynomial

basis functions

kernel function

221122

222121

21

21 2221, iiiiiii xxxxxxxxxxxxK xx

Tiiiiiii xxxxxx ]2,2,,2,,1[ 21

2221

21x

Page 51: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 52

1,2,..Ni 0, bd iiiT

i 1)( xw1,2,..Ni bd iT

i 1)( xw

www T

2

1)(

Minimize:

Constraint:

Separable samples: Not separable samples:

P

ii

T C12

1),( www

Constraint:

Minimize:

Where by minimizing www T

2

1)( we maximize the distance of

the classes, whilst we also control the VC dimension.

SVR (classification)

Page 52: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 53

SVR (regression)

C()

otherwise),(

),( ha),()),(,(

xx

xxfy

fyfyfyC

Page 53: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 54

SVR (regression)

Pi

ε

εd

i

i

iiiT

iiT

i

1,2,...,

,0ξ

,0ξ

,ξd

xw

xw

P

i

T CF1

ii ξξ2

1ξξ,, www

Constraints: Minimize:

M

jjjwy

0x

Page 54: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 55

SVR (regression)Lagrange function

dual problem

constraints

support vectors

solution

)()

)2

1)(,,,,,,

11

11

ii

P

iiii

Ti

P

ii

iiiTP

ii

Ti

P

ii

y

yCJ

xw

xwwww

i

P

iii xw

1

ii C ii C

iii :x

ji

P

i

P

jjjii

P

iii

P

iiiiii KyW xx ,

2

1

1 111

01

P

iii ,0 Ci ,0 Ci

Page 55: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 56

SVR (regression)

Page 56: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 57

SVR (regression)

Page 57: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 58

SVR (regression)

Page 58: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 59

SVR (regression)

Page 59: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 60

Support vector machines Main advantages

generalization

size of the network

centre parameters for RBF

linear-in-the-parameter structure

noise immunity

Page 60: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 61

Support vector machines Main disadavantages

computation intensive (quadratic optimization)

hyperparameter selection VC dimension (classification)

batch processing

Page 61: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 62

Support vector machines Variants

LS SVM

basic criterion function

Advantages: easier to compute

adaptivity,

Page 62: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 63

Mixture of SVMs Problem of hyper-parameter selection for SVMs

Different SVMs, with different hyper-parameters

Soft separation of the input space

Page 63: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 64

Mixture of SVMs

Page 64: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 65

Boosting techniques Boosting by filtering

Boosting by subsampling

Boosting by reweighting

Page 65: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 66

Boosting techniques Boosting by filtering

Page 66: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 67

Boosting techniques Boosting by subsampling

Page 67: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 68

Boosting techniques Boosting by reweighting

Page 68: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 69

Other modular architectures

Page 69: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 70

Other modular architectures

Page 70: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 71

Other modular architectures Modular classifiers

Decoupled modules

Hierarchical modules

Network ensemble (linear combination)

Network ensemble (decision, voting)

Page 71: 11. 10. 2001.NIMIA Crema, Italy1 Identification and Neural Networks I S R G G. Horváth Department of Measurement and Information Systems

11. 10. 2001. NIMIA Crema, Italy 72

Modular architectures