Upload
blake-arnold
View
215
Download
2
Embed Size (px)
Citation preview
Other NN Models
• Reinforcement learning (RL)• Probabilistic neural networks• Support vector machine (SVM)
Reinforcement learning (RL)
• Basic ideas: – Supervised learning: (delta rule, BP)
• Samples (x, f(x)) to learn function f(.)• precise error can be determined and is used to drive the
learning.– Unsupervised learning: (competitive, SOM, BM)
• no target/desired output provided to help learning, • learning is self-organized/clustering
– reinforcement learning: in between the two• no target output for input vectors in training samples• a judge/critic will evaluate the output
good: reward signal (+1) bad: penalty signal (-1)
• RL exists in many places– Originated from psychology (conditional reflex)
– In many applications, it is much easier to determine good/bad, right/wrong, acceptable/unacceptable than to provide precise correct answer/error.
– It is up to the learning process to improve the system’s performance based on the critic’s signal.
– Machine learning community, different theories and algorithms
major difficulty: credit/blame distribution
chess playing: W/L (multi-step)
soccer playing: W/L (multi-player)
• Principle of RL– Let r = +1 reword (good output)
r = -1 penalty (bad output)
– If r = +1, the system is encouraged to continue what it is doing
If r = -1, the system is encouraged not to do what it is doing.
– Need to search for better output
• because r = -1 does not indicate what the good output should be.
• common method is “random search”
• ARP: the associative reword-and-penalty – Algorithm for NN RL (Barton and Anandan, 1985)
– Architecture
criticz(k)
y(k)
x(k)
input: x(k)
output: y(k)
stochastic units: z(k) for random search
– Random search by stochastic units zi
or let zi obey a continuous probability distribution
function.
or let is a random noise, obeys
certain distribution.
Key: z is not a deterministic function of x, this gives z a chance to be a good output.
– Prepare desired output (temporary)
2 / 1
2 / 1
( 1) (1 )
( 1) (1 )
i
i
net Ti
net Ti
p z e
p z e
wherei iz net
( ) if ( ) 1( )
( ) if ( ) 1y k r k
d ky k r k
– Compute the errors at z layer
where E(z(k)) is the expected value of z(k) because z is a random variable
How to compute E(z(k))• take average of z over a period of time• compute from the distribution, if possible• if logistic sigmoid function is used,
– Training: • Delta rule to learn weights for output nodes
• BP or other methods to modify weights at lower layers
( ) ( ) ( ( ))e k d k E z k
( ) ( 1) ( ) ( 1)(1 ( )) tanh( / )i i i iE z g net g net net T
if 1 with
if 1i j
iji j
e y rw
e y r
?
Probabilistic Neural Networks
1. Purpose: classify a given input pattern x into one of the pre-defined classes by Bayesian decision rule.– Suppose there are k predefined classes s1, …sk
P(si): prior probability of class si
P(x|si): conditional probability of x, given si
P(x): probability of x
P(si|x): posterior probability of si, given x– Example:
, the set of all patients
si: the set of all patients having disease si
x: a description (manifestations) of a patient
1{ }kS s s K
P(x|si): prob. patient with disease si will have
description x
P(si|x): prob. patient with description x will have
disease si.
by Bayes’ theorem:)|(max)|( xsPxsP j
ji
( | ) ( )( | ) because ( ) is constant,
( )( | ) max ( | ) iff ( | ) ( ) max ( | ) ( )
In , ( | ) are learned from examplers
i ii i
i j i i j jj j
i
P x s P sP s x P x s
P xP s x P s x P x s P s P x s P s
PNN P x s
2. Estimate probabilities - Training exemplars: the jth exemplar belonging to si
- Priors can be obtained either by experts’ estimate or calculated from exemplars
- Conditionals are estimated according to Parzen estimator:
- closely related to radial basis function of Gaussian
2( )
/ 2 21
1( | ) exp
(2 ) 2
where : dimension of the pattern : # of exemplars in : input pattern
iin
j
i m mji i i
i i
x xP x s
n
mn sx
( )ijx
1( ) | | / | |ki i j jP s s s
2
2
1 ( )( ) exp( )
22
x uf x
3. PNN architecture: feed forward of 4 layers
input layer
decision layer
class layer
exemplar layer
• Exemplar layer: RBF nodes, one per exemplar, centered on
• Class layer: connecting to all exemplars belonging to that class si,
• Decision layer: picks up winner based on
• If necessary training to adjust weights for upper layers
( )ijx
2( ) 2exp( / )ij jy x x
i jz y
( )
( )
determined by the distance between and is large if it is close to ,
ij j
ij j
y x xy x
( )
approxi. Parzen estimate of ( | )
is large if is close to more i i
ii j
z P x s
z x x
( )i iz P s
4. Comments:– Classification by Bayes’ rule
– Fast classification
– Fast learning
– Guaranteed to approach the Bayes’ optimal decision surface provided that the class probability density functions are smooth and continuous.
– Trade nodes for time( not good with large training samples)
– The probabilistic density function to be represented must be smooth and continuous.