Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
MLP
The Multi-layer Perceptron
D r. S y e d I m t i y a z H a s s a nA s s i s t a n t P r o f e s s o r, D e p a r t m e n t . o f C S E , J a m i a H a m d a r d( D e e m e d t o b e U n i v e r s i t y ) , N e w D e l h i , I n d i a .
h t t p s : / / S y e d i m t i y a z h a s s a n . o r gs . i m t i y a z @ j a m i a h a m d a r d . a c . i nh t t p : / / w w w. j a m i a h a m d a r d . e d u
MLP
XOR RevisitS O L U T I O N U S I N G M L P
2
MLP
The Sigmoid Threshold Unit
3
Adaline• Adaptive Linear Element
• Proposed by Widrow & Hoff, 1960
4
Adaline
X is voltages w is conductance of controllableresistors
Madaline (Many Adaline) Adaline connected to AND logic
Adaline & Madaline are single layer.
5
Adaline
Also known as LMS or Widrow & Hoff rule
Update formula
6
D E LTA R U L E
MLP Architecture
MLP
The 3-3-2 Network
8
Gradient descent B a s i s f o r t h e B A C K P R O PA G AT I O N A l g o r i t h m
• k = number of outputs
• d = a training example
• td = target output
• od = output of the unit
• D = set of training example
9
• Error = Half of squared difference
• E as a function of w, because the linear unit output o depends on this weight vector.
10
Gradient descent B a s i s f o r t h e B A C K P R O PA G AT I O N A l g o r i t h m
• gradient of E w.r.t. w
• Training Rule
11
Gradient descent B a s i s f o r t h e B A C K P R O PA G AT I O N A l g o r i t h m
• Training Rule (in component form)
12
Gradient descent B a s i s f o r t h e B A C K P R O PA G AT I O N A l g o r i t h m
• gradient
13
Gradient descent B a s i s f o r t h e B A C K P R O PA G AT I O N A l g o r i t h m
14
Gradient descent B a s i s f o r t h e B A C K P R O PA G AT I O N A l g o r i t h m
• A Differentiable Threshold Unit
15
Multi Layer PerceptronF E E D F O R WA R D B A C K P R O PA G AT I O N
• Networks with multiple output units rather than single units
16
Multi Layer PerceptronF E E D F O R WA R D B A C K P R O PA G AT I O N
MLP
Backpropagation Algorithm
17
The stochastic gradient descent version of the Backpropagation Algorithm
for feedforward networks containing two layers of sigmoid units
MLP
Backpropagation Algorithm
18
• Batch algorithm converges to a local minimum faster than the sequential algorithm
Mini-batches
• is used for splitting the training set into random batches
• estimating the gradient based on one of the subsets of the training set
• performing a weight update and then
• using the next subset to estimate a new gradient and using that for the weight update
• until all of the training set have been used
19
Mini-batchesC H A N C E T O E S C A P E F R O M L O C A L M I N I M A
• Extreme version of the mini-batch idea
• to use just one piece of data to estimate the gradient at each iteration of the algorithm, and to pick that piece of data uniformly at random from the training set.
• It is often used if the training set is very large
20
Stochastic Gradient DescentF O R L A R G E T R A I N I N G S E T
• Weight update on the nth iteration depend partially on the update that occurred during the (n - 1)th
iteration
21
Adding Momentum
• An ANN that uses radial basis functions as activation functions.
• The output of the network is a linear combination of RBFs of the inputs and neuron parameters.
• RBF is a real-valued function whose value depends only on the distance from the origin.
22
RBFN
• Radial basis function (RBF) networks typically have three layers: an input layer, a hidden layer with a non-linear RBF activation function and a linear output layer.
23
RBFN
• Euclidian
• Gaussian
• Multiquadric
• ….
24
RBFN
• Adaptive Resonance Theory
• Developed by Stephen Grossberg and Gail Carpenter in 1987.
• The basic ART system is an unsupervised learning model.
• Always open to new learning (adaptive) without losing the old patterns (resonance).
25
ART
• Recognition phase• The input vector is compared with the classification
presented at every node in the output layer.
• The output of the neuron becomes “1” if it best matches with the classification applied, otherwise it becomes “0”.
• Comparison phase• A comparison of the input vector to the comparison layer
vector is done. The condition for reset is that the degree of similarity would be less than vigilance parameter.
26
ART Operating Principal
• Search phase• The network will search for reset as well as the match
done in the above phases.
• If there would be no reset and the match is quite good, then the classification is over.
• Otherwise, the process would be repeated and the other stored pattern must be sent to find the correct match.
27
ART Operating Principal
• ART 1
• ART 2
• ARTMAP (Predictive ART)
• Fuzzy ART
• Fuzzy ARTMAP
• Gaussian ART
• Gaussian ARTMAP
28
ART Types
MLP
Summary Adal ine
Delta Rule
Gradient Descent
Backpropagat ion
RBFN
ART
29
Thank You