Other NN Models Reinforcement learning (RL) Probabilistic neural networks Support vector machine (SVM)

Other NN Models

• Reinforcement learning (RL)• Probabilistic neural networks• Support vector machine (SVM)

Reinforcement learning (RL)

• Basic ideas: – Supervised learning: (delta rule, BP)

• Samples (x, f(x)) to learn function f(.)• precise error can be determined and is used to drive the

learning.– Unsupervised learning: (competitive, SOM, BM)

• no target/desired output provided to help learning, • learning is self-organized/clustering

– reinforcement learning: in between the two• no target output for input vectors in training samples• a judge/critic will evaluate the output

good: reward signal (+1) bad: penalty signal (-1)

• RL exists in many places– Originated from psychology (conditional reflex)

– In many applications, it is much easier to determine good/bad, right/wrong, acceptable/unacceptable than to provide precise correct answer/error.

– It is up to the learning process to improve the system’s performance based on the critic’s signal.

– Machine learning community, different theories and algorithms

major difficulty: credit/blame distribution

chess playing: W/L (multi-step)

soccer playing: W/L (multi-player)

• Principle of RL– Let r = +1 reword (good output)

r = -1 penalty (bad output)

– If r = +1, the system is encouraged to continue what it is doing

If r = -1, the system is encouraged not to do what it is doing.

– Need to search for better output

• because r = -1 does not indicate what the good output should be.

• common method is “random search”

• ARP: the associative reword-and-penalty – Algorithm for NN RL (Barton and Anandan, 1985)

– Architecture

criticz(k)

y(k)

x(k)

input: x(k)

output: y(k)

stochastic units: z(k) for random search

– Random search by stochastic units zi

or let zi obey a continuous probability distribution

function.

or let is a random noise, obeys

certain distribution.

Key: z is not a deterministic function of x, this gives z a chance to be a good output.

– Prepare desired output (temporary)

2 / 1

2 / 1

( 1) (1 )

( 1) (1 )

i

i

net Ti

net Ti

p z e

p z e

wherei iz net

( ) if ( ) 1( )

( ) if ( ) 1y k r k

d ky k r k

– Compute the errors at z layer

where E(z(k)) is the expected value of z(k) because z is a random variable

How to compute E(z(k))• take average of z over a period of time• compute from the distribution, if possible• if logistic sigmoid function is used,

– Training: • Delta rule to learn weights for output nodes

• BP or other methods to modify weights at lower layers

( ) ( ) ( ( ))e k d k E z k

( ) ( 1) ( ) ( 1)(1 ( )) tanh( / )i i i iE z g net g net net T

if 1 with

if 1i j

iji j

e y rw

e y r

?

Probabilistic Neural Networks

1. Purpose: classify a given input pattern x into one of the pre-defined classes by Bayesian decision rule.– Suppose there are k predefined classes s1, …sk

P(si): prior probability of class si

P(x|si): conditional probability of x, given si

P(x): probability of x

P(si|x): posterior probability of si, given x– Example:

, the set of all patients

si: the set of all patients having disease si

x: a description (manifestations) of a patient

1{ }kS s s K

P(x|si): prob. patient with disease si will have

description x

P(si|x): prob. patient with description x will have

disease si.

by Bayes’ theorem:)|(max)|( xsPxsP j

ji

( | ) ( )( | ) because ( ) is constant,

( )( | ) max ( | ) iff ( | ) ( ) max ( | ) ( )

In , ( | ) are learned from examplers

i ii i

i j i i j jj j

i

P x s P sP s x P x s

P xP s x P s x P x s P s P x s P s

PNN P x s

2. Estimate probabilities - Training exemplars: the jth exemplar belonging to si

- Priors can be obtained either by experts’ estimate or calculated from exemplars

- Conditionals are estimated according to Parzen estimator:

- closely related to radial basis function of Gaussian

2( )

/ 2 21

1( | ) exp

(2 ) 2

where : dimension of the pattern : # of exemplars in : input pattern

iin

j

i m mji i i

i i

x xP x s

n

mn sx

( )ijx

1( ) | | / | |ki i j jP s s s

2

2

1 ( )( ) exp( )

22

x uf x

3. PNN architecture: feed forward of 4 layers

input layer

decision layer

class layer

exemplar layer

• Exemplar layer: RBF nodes, one per exemplar, centered on

• Class layer: connecting to all exemplars belonging to that class si,

• Decision layer: picks up winner based on

• If necessary training to adjust weights for upper layers

( )ijx

2( ) 2exp( / )ij jy x x

i jz y

( )

( )

determined by the distance between and is large if it is close to ,

ij j

ij j

y x xy x

( )

approxi. Parzen estimate of ( | )

is large if is close to more i i

ii j

z P x s

z x x

( )i iz P s

4. Comments:– Classification by Bayes’ rule

– Fast classification

– Fast learning

– Guaranteed to approach the Bayes’ optimal decision surface provided that the class probability density functions are smooth and continuous.

– Trade nodes for time( not good with large training samples)

– The probabilistic density function to be represented must be smooth and continuous.

Documents

Other NN Models Reinforcement learning (RL) Probabilistic neural networks Support vector machine (SVM)