Transfer functions: hidden possibilities for better neural networks

Transfer functions: hidden possibilities for better neural networks.

Transfer functions: hidden possibilities for better neural networks.

Włodzisław Duch and Norbert Jankowski Department of Computer Methods,

Nicholas Copernicus University, Torun, Poland.

http://www.phys.uni.torun.pl/kmk

Why is this an important issue?Why is this an important issue?

MLPs are universal approximators - no need for other TF?

Wrong bias => poor results, complex networks.

Example of a 2-class problems:

Class 1 inside the sphere, Class 2 outside.

MLP: at least N +1 hyperplanes, O(N2) parameters.

RBF: 1 Gaussian, O(N) parameters.

Class 1 in the corner defined by (1,1 ... 1) hyperplane, C2 outside.

MLP: 1 hyperplane, O(N) parameters.

RBF: many Gaussians, O(N2) parameters, poor approximation.

InspirationsInspirationsLogical rule: IF x1>0 & x2>0 THEN Class1 Else Class2

is not properly represented neither by MLP nor RBF!

Result: decision trees and logical rules perform on some datasets (cf. hypothyroid) significantly better than MLPs!

Speed of learning and network complexity depends on TF. Fast learning requires flexible „brain modules” - TF.

• Biological inspirations: sigmoidal neurons are crude approximation at the basic level of neural tissue.

• Interesting brain functions are done by interacting minicolumns, implementing complex functions.

• Modular networks: networks of networks.

• First step beyond single neurons: transfer functions

providing flexible decision borders.

Transfer functionsTransfer functionsTransfer function f(I(X)): vector activation I(X) and scalar output o(I).

1. Fan-in, scalar product activation W.X, hyperplanes.

1

2 2 2

max

;

1,

2

N

i i ij jj

I W X

I D

X W W X

W X W X W X W X

3. Mixed activation functions

( ; )D X R X R

2. Distance functions as activations, for example Gaussian functions:

( ; , )A X W R W X X R

Taxonomy - activation f.Taxonomy - activation f.

Taxonomy - output f.Taxonomy - output f.

Taxonomy - TFTaxonomy - TF

TF in Neural NetworksTF in Neural NetworksChoices:1. Homogenous NN: select best TF, try several types

Ex: RBF networks; SVM kernels (today 50=>80% change).2. Heterogenous NN: one network, several types of TF

Ex: Adaptive Subspace SOM (Kohonen 1995), linear subspaces.Projections on a space of basis functions.

3. Input enhancement: adding fi(X) to achieve separability. Ex: functional link networks (Pao 1989), tensor products of inputs; D-MLP model.

Heterogenous:

1. Start from large network with different TF, use regularization to prune

2. Construct network adding nodes selected from a pool of candidates

3. Use very flexible TF, force them to specialize.

Most flexible TFsMost flexible TFs

Conical functions: mixed activations

( ; , , , ) ( - ; ) ( , ) CA I D X W R X R W X R

Lorentzian: mixed activations

1 2 2

1; , , ,

1 ( ; ) ( ; )GLCI D

X W RX W X R

1

; , , , , 1i i i i

Ns b s b

i i i i i ii

SBi e X D e e X D e

X D b s α β

Bicentral - separable functions

Bicentral + rotationsBicentral + rotations

6N parameters, most general.

1

' '

1

; , ' '

; , ', ; , ; ,N

K N N i i ii

L X D D X D X D

C L D D L X D D

X D D W W X

Box in N-1 dim x rotated window.

'

1

; , , , ', ,

1i i i i

Ns b s b

i i i i i ii

SBi

e X D e e X D e

X D b s s α β

Rotation matrix with band structure makes 2x2 rotations.

1

'

1

1

; , ', , ,

; dla 1.. 1

N

i i i i i ii

ii i ii i

SB R D R D

R s R i N

X D D α β R X X

Some properties of TFsSome properties of TFs

For logistic functions:

Renormalization of a Gaussian gives logistic function

1

1

x b x b x b x b

b bb b

2

1

; ,; ,

; , ; ,

1

1 exp 4 /

gR

g g

N

i i ii

GG

G G

D b X

X D bX D b

X D b X D b

W Xwhere:Wi =4Di /bi

2

Example of input transformationExample of input transformationMinkovsky’s distance function:

Sigmoidal activation changed to:

0 , ;d D W X W X

1 1

, ; ,N N

i i i i ii i

D d W X W X

W X

, , ,0;e eX D X const

X X

Adding a single input renormalizing the vector:

ConclusionsConclusions

Radial and sigmoidal functions are not the only choice.Radial and sigmoidal functions are not the only choice.

StatLog report: large differences of RBF and MLP on many datasets.

Better learning cannot repair wrong bias of the model.

Systematic investigation and taxonomy of TF is worthwhile.

Networks should select/optimize their functions.

StatLog report: large differences of RBF and MLP on many datasets.

Better learning cannot repair wrong bias of the model.

Systematic investigation and taxonomy of TF is worthwhile.

Networks should select/optimize their functions.

Open questions:

Optimal balance between complex nodes/interactions (weights)?How to train heterogeneous networks? How to optimize nodes in a constructive algorithms?Hierarchical, modular networks: nodes that are networks themselves.

Open questions:

Optimal balance between complex nodes/interactions (weights)?How to train heterogeneous networks? How to optimize nodes in a constructive algorithms?Hierarchical, modular networks: nodes that are networks themselves.

The End ?

Perhaps the beginning ...

Documents

Transfer functions: hidden possibilities for better neural networks