Upload
sapphire-ilithya
View
19
Download
2
Embed Size (px)
DESCRIPTION
Transfer functions: hidden possibilities for better neural networks. W ł odzis ł aw Duch and Norbert Jankowski Department of Computer Methods, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk. Why is this an important issue?. - PowerPoint PPT Presentation
Citation preview
Transfer functions: hidden possibilities for better neural networks.
Transfer functions: hidden possibilities for better neural networks.
Włodzisław Duch and Norbert Jankowski Department of Computer Methods,
Nicholas Copernicus University, Torun, Poland.
http://www.phys.uni.torun.pl/kmk
Why is this an important issue?Why is this an important issue?
MLPs are universal approximators - no need for other TF?
Wrong bias => poor results, complex networks.
Example of a 2-class problems:
Class 1 inside the sphere, Class 2 outside.
MLP: at least N +1 hyperplanes, O(N2) parameters.
RBF: 1 Gaussian, O(N) parameters.
Class 1 in the corner defined by (1,1 ... 1) hyperplane, C2 outside.
MLP: 1 hyperplane, O(N) parameters.
RBF: many Gaussians, O(N2) parameters, poor approximation.
InspirationsInspirationsLogical rule: IF x1>0 & x2>0 THEN Class1 Else Class2
is not properly represented neither by MLP nor RBF!
Result: decision trees and logical rules perform on some datasets (cf. hypothyroid) significantly better than MLPs!
Speed of learning and network complexity depends on TF. Fast learning requires flexible „brain modules” - TF.
• Biological inspirations: sigmoidal neurons are crude approximation at the basic level of neural tissue.
• Interesting brain functions are done by interacting minicolumns, implementing complex functions.
• Modular networks: networks of networks.
• First step beyond single neurons: transfer functions
providing flexible decision borders.
Transfer functionsTransfer functionsTransfer function f(I(X)): vector activation I(X) and scalar output o(I).
1. Fan-in, scalar product activation W.X, hyperplanes.
1
2 2 2
max
;
1,
2
N
i i ij jj
I W X
I D
X W W X
W X W X W X W X
3. Mixed activation functions
( ; )D X R X R
2. Distance functions as activations, for example Gaussian functions:
( ; , )A X W R W X X R
Taxonomy - activation f.Taxonomy - activation f.
Taxonomy - output f.Taxonomy - output f.
Taxonomy - TFTaxonomy - TF
TF in Neural NetworksTF in Neural NetworksChoices:1. Homogenous NN: select best TF, try several types
Ex: RBF networks; SVM kernels (today 50=>80% change).2. Heterogenous NN: one network, several types of TF
Ex: Adaptive Subspace SOM (Kohonen 1995), linear subspaces.Projections on a space of basis functions.
3. Input enhancement: adding fi(X) to achieve separability. Ex: functional link networks (Pao 1989), tensor products of inputs; D-MLP model.
Heterogenous:
1. Start from large network with different TF, use regularization to prune
2. Construct network adding nodes selected from a pool of candidates
3. Use very flexible TF, force them to specialize.
Most flexible TFsMost flexible TFs
Conical functions: mixed activations
( ; , , , ) ( - ; ) ( , ) CA I D X W R X R W X R
Lorentzian: mixed activations
1 2 2
1; , , ,
1 ( ; ) ( ; )GLCI D
X W RX W X R
1
; , , , , 1i i i i
Ns b s b
i i i i i ii
SBi e X D e e X D e
X D b s α β
Bicentral - separable functions
Bicentral + rotationsBicentral + rotations
6N parameters, most general.
1
' '
1
; , ' '
; , ', ; , ; ,N
K N N i i ii
L X D D X D X D
C L D D L X D D
X D D W W X
Box in N-1 dim x rotated window.
'
1
; , , , ', ,
1i i i i
Ns b s b
i i i i i ii
SBi
e X D e e X D e
X D b s s α β
Rotation matrix with band structure makes 2x2 rotations.
1
'
1
1
; , ', , ,
; dla 1.. 1
N
i i i i i ii
ii i ii i
SB R D R D
R s R i N
X D D α β R X X
Some properties of TFsSome properties of TFs
For logistic functions:
Renormalization of a Gaussian gives logistic function
1
1
x b x b x b x b
b bb b
2
1
; ,; ,
; , ; ,
1
1 exp 4 /
gR
g g
N
i i ii
GG
G G
D b X
X D bX D b
X D b X D b
W Xwhere:Wi =4Di /bi
2
Example of input transformationExample of input transformationMinkovsky’s distance function:
Sigmoidal activation changed to:
0 , ;d D W X W X
1 1
, ; ,N N
i i i i ii i
D d W X W X
W X
, , ,0;e eX D X const
X X
Adding a single input renormalizing the vector:
ConclusionsConclusions
Radial and sigmoidal functions are not the only choice.Radial and sigmoidal functions are not the only choice.
StatLog report: large differences of RBF and MLP on many datasets.
Better learning cannot repair wrong bias of the model.
Systematic investigation and taxonomy of TF is worthwhile.
Networks should select/optimize their functions.
StatLog report: large differences of RBF and MLP on many datasets.
Better learning cannot repair wrong bias of the model.
Systematic investigation and taxonomy of TF is worthwhile.
Networks should select/optimize their functions.
Open questions:
Optimal balance between complex nodes/interactions (weights)?How to train heterogeneous networks? How to optimize nodes in a constructive algorithms?Hierarchical, modular networks: nodes that are networks themselves.
Open questions:
Optimal balance between complex nodes/interactions (weights)?How to train heterogeneous networks? How to optimize nodes in a constructive algorithms?Hierarchical, modular networks: nodes that are networks themselves.
The End ?
Perhaps the beginning ...