Upload
amir-masoud-sefidian
View
454
Download
2
Embed Size (px)
Citation preview
1
Adaptive Resonance Theory (ART)
Shahid Rajaee Teacher Training UniversityFaculty of Computer Engineering
PRESENTED BY:
Amir Masoud Sefidian
OUTLINE
• Introduction (Stability-Plasticity Dilemma)
• ART Network
• ART Types
• Basic ART network Architecture
• ART Algorithm and Learning
• ART Computational Example
• ART Application
• Conclusion
• Main References
Stability-Plasticity Dilemma (SPD)
• System behaviour doesn’t change after irrelevant eventsStability
• System adapts its behaviour according to significant eventsPlasticity
• How to achieve stability without rigidity and plasticity without chaos?
• Ongoing learning capability
• Preservation of learned knowledge
Dilemma
Real world is faced with a situations where data is continuously changing. Every learning system faces plasticity-stability dilemma.How can we create a machine that can act and navigate in a world that is constantly changing?
SPD (Contd.)Every learning system faces the plasticity-stability dilemma.
• The plasticity-stability dilemma poses few questions :
How can we continue to quickly learn new things about the
environment and yet not forgetting what we have already learned?
How can a learning system remain plastic (adaptive) in response to
significant input yet stable in response to irrelevant input?
How can a neural network can remain plastic enough to learn newpatterns and yet be able to maintain the stability of the alreadylearned patterns?
How does the system know to switch between its plastic and stablemodes.
5
Back-propagation DrawBack
• The back-propagation algorithm suffer from stability problem. • Once a back propagation is trained, the number of hidden neurons and the
weights are fixed. • The network cannot learn from new patterns unless the network is
retrained from scratch. • Thus we consider the back propagation networks don’t have plasticity. • Assuming that the number of hidden neurons can be kept constant, the
plasticity problem can be solved by retraining the network on the new patterns using on-line learning rule.
• However it will cause the network to forget about old knowledge rapidly. We say that such algorithm is not stable.
• Gail Carpenter and Stephen Grossberg (Boston University) developed the “Adaptive Resonance Theory” in 1976 to learning model to answer this dilemma.
• ART networks tackle the stability-plasticity dilemma:▫ Maintain the plasticity required to learn new patterns, while preventing the modification
of pattern that have been learned previously.▫ No stored pattern is modified if input is not matched with any pattern.
▫ Plasticity: They can always adapt to unknown inputs (by creating a new cluster with a new weight vector) if the given input cannot be classified by existing clusters (e.g., this is a computational corollary of the biological model of neural plasticity).
▫ Stability: Existing clusters are not deleted by the introduction of new inputs (new clusters will just be created in addition to the old ones).
▫ The basic ART System is an unsupervised learning model.
The key innovation :
“expectations”
As each input is presented to the network, it is compared with the prototype vectorthat is most closely matches (the expectation).
If the match between the prototype and the input vector is NOT adequate, a new prototype is selected.
In this way, previous learned memories (prototypes) are not eroded by new learning.
8
9
ART NetworksART Networks
Grossberg, 1976
Unsupervised ART
Learningsupervised ART Learning
ART1, ART2
Carpenter &
Grossberg,
1987
Fuzzy ART
Carpenter &
Grossberg,
etal,1991
ARTMAP
Carpenter &
Grossberg,
etal,1991
Fuzzy ARTMAP
Carpenter &
Grossberg,
etal,1991
Gaussian
ARTMAP
Williamson,1
992
Simplified ART
Baraldi & Alpaydin, 1998
Simplified ARTMAP
Kasuba, 1993
Mahalanobis Distance Based ARTMAP
Vuskovic & Du, 2001, Vuskovic, Xu & Du, 2002
• ART 1 :▫ simplest variety of ART networks
▫ accepting only binary inputs.
• ART2 :▫ support continuous inputs.
• ART3 is refinement of both models.
• Fuzzy ART implements fuzzy logic into ART’s pattern recognition.
• ARTMAP also known as Predictive ART, combines two slightly modified ART-1 or ART-2 units into a supervised learning structure .
• Fuzzy ARTMAP is merely ARTMAP using fuzzy ART units, resulting in a corresponding increase in efficacy.
Basic ART network ArchitectureThe basic ART system is unsupervised learning
model. It typically consists of
1. a comparison field
2. a recognition field composed of neurons,
3. a vigilance parameter, and
4. a reset module
F2 node is in one of three states:•Active
•Inactive, but available to participate in
competition
•Inhibited, and prevented from participating
in competition
12
ART SubsystemsLayer 1
Comparison of input pattern and expectation.
L1-L2 Connections
Perform clustering operation.
Each row of weights is a prototype pattern.
Layer 2
Competition (Contrast enhancement) winner-take-all learning strategy.
L2-L1 Connections
Perform pattern recall (Expectation).
Orienting Subsystem
Causes a reset when expectation does not match input pattern
The degree of similarity required for patterns to be assigned to the same cluster unit is
controlled by a user-defined gain control, known as the vigilance parameter
ART Algorithm
Adapt winner
node
Initialise uncommitted
node
new pattern
categorisation
known unknown
recognition
comparison
Incoming pattern matched with
stored cluster templates
If close enough to stored template
joins best matching cluster,
weights adapted
If not, a new cluster is initialised
with pattern as template
Recognition Phase(in Layer 2)
•Forward transmission via bottom-up weights(Inner product)
•Best matching node fires (winner-take-all layer)
Comparison Phase (in Layer 1)• Backward transmission via top-down weights
• Vigilance test: class template matched with input pattern
• If pattern close enough to template, categorisation was successful and “resonance” achieved
• If not close enough reset winner neuron and try next best matching• (The reset inhibit the current winning neuron, and the current expectation is
removed)• A new competition is then performed in Layer 2, while the previous winning neuron
is disable.• The new winning neuron in Layer 2 projects a new expectation to Layer 1, through
the L2-L1 connections.• This process continues until the L2-L1 expectation provides a close enough match to
the input pattern.• The process of matching, and subsequent adaptation, is referred to as resonance.
Step 1: Send input from the F1 layer to F2 layer for processing. The first node within the F2 layer is chosen as the closest match to the input and a hypothesis is formed. This hypothesis represents whatthe node will look like after learning has occurred, assuming it is the correct node to be updated.
Step2: Assume j is winner.If T j (I*) > ρ then the hypothesis is accepted and assigned to that node. Otherwise, the process moves on to Step 3.
Step3:If the hypothesis is rejected, a “reset” command is sent back to the F2 layer. In this situation, the jth node within F2 is no longer a candidate so the process repeats for node j+1.
20
Learning in ART1Updates for both the bottom-up and top-down weights are controlled by
differential equations.Assuming Jth node is winner and their connected weightsshould be updated, Two separate learning laws: L2-L1 connections:
Bottom-up weights should be smaller than or equal to value:
Solve
Solve
Initial weights for ART1:
top-down weights are initialized to 1.
1)0( jit
L1-L2 connections
22
Vigilance Threshold• A vigilance parameter ρ determines the tolerance of
matching process.
• Vigilance threshold sets granularity of clustering.• It defines amount of attraction of each prototype.• Low threshold
▫ Large mismatch accepted▫ Few large clusters▫ Misclassifications more likely▫ the algorithm will be more willing to accept input
vectors into clusters (i.e., definition of similarity is LESS strict).
• High threshold▫ Small mismatch accepted▫ Many small clusters▫ Higher precision▫ the algorithm will be more "picky" about assigning
input vectors to clusters (i.e., the definition of similarity is MORE strict)
23
Example:
For this example, let us assume that we have an ART-1 network with 7 input neurons (n = 7) and initially one output neuron (n = 1).
Our input vectors are
{(1, 1, 0, 0, 0, 0, 1),(0, 0, 1, 1, 1, 1, 0),(1, 0, 1, 1, 1, 1, 0),(0, 0, 0, 1, 1, 1, 0),(1, 1, 0, 1, 1, 1, 0)}
and the vigilance parameter = 0.7.
Initially, all top-down weights are set to 𝑡𝑙, 1(0) = 1, and all bottom-up weights are set to 𝑏1, 𝑙
(0) = 1/8.
ART Example Computation 24
For the first input vector, (1, 1, 0, 0, 0, 0, 1), we get:
Clearly, y1 is the winner (there are no competitors).
Since we have:
the vigilance condition is satisfied and we get the following new weights:
ART Example Computation 25
8
31
8
10
8
10
8
10
8
10
8
11
8
11
8
11 y
7.013
37
1
7
1 1,
l l
l ll
x
xt
5.3
1
35.0
1)1()1()1( 7,12,11,1
bbb
0)1()1()1()1( 6,15,14,13,1 bbbb
Also, we have:
We can express the updated weights as matrices:
Now we have finished the first learning step and proceed by presenting the next input vector.
ART Example Computation 26
lll xtt )0()1( 0,1,
T
5.3
1 0 0 0 0
5.3
1
5.3
1)1(
B
T1 0 0 0 0 1 1)1( T
For the second input vector, (0, 0, 1, 1, 1, 1, 0), we get:
Of course, y1 is still the winner.
However, this time we do not reach the vigilance threshold:
This means that we have to generate a second node in the output layer that represents the current input.
Therefore, the top-down weights of the new node will be identical to the current input vector.
ART Example Computation 27
005.3
1101010100
5.3
10
5.3
11 y
.7.004
07
1
7
1 1,
l l
l ll
x
xt
The new unit’s bottom-up weights are set to zero in the positions where the input has zeroes as well.
The remaining weights are set to:1/(0.5 + 0 + 0 + 1 + 1 + 1 + 1 + 0)
This gives us the following updated weight matrices:
ART Example Computation 28
T
0 4.5
1 4.5
1 4.5
1 4.5
1 0 0
3.51 0 0 0 0
3.51
3.51
)2(
B
T
0 1 1 1 1 0 0
1 0 0 0 0 1 1)2(
T
For the third input vector, (1, 0, 1, 1, 1, 1, 0), we have:
Here, y2 is the clear winner.
This time we exceed the vigilance threshold again:
Therefore, we adapt the second node’s weights.
Each top-down weight is multiplied by the corresponding element of the current input.
ART Example Computation 29
5.4
4 ;
5.3
121 yy
.7.08.05
47
1
7
1 2,
l l
l ll
x
xt
The new unit’s bottom-up weights are set to the top-down weights divided by (0.5 + 0 + 0 + 1 + 1 + 1 + 1 + 0).
It turns out that, in the current case, these updates do not result in any weight changes at all:
ART Example Computation 30
T
0 4.5
1 4.5
1 4.5
1 4.5
1 0 0
3.51 0 0 0 0
3.51
3.51
)3(
B
T
0 1 1 1 1 0 0
1 0 0 0 0 1 1)3(
T
For the fourth input vector, (0, 0, 0, 1, 1, 1, 0), it is:
Again, y2 is the winner.
The vigilance test succeeds once again:
Therefore, we adapt the second node’s weights.
As usual, each top-down weight is multiplied by the corresponding element of the current input.
ART Example Computation 31
5.4
3 ;0 21 yy
.7.013
37
1
7
1 2,
l l
l ll
x
xt
The new unit’s bottom-up weights are set to the top-down weights divided by (0.5 + 0 + 0 + 0 + 1 + 1 + 1 + 0).
This gives us the following new weight matrices:
ART Example Computation 32
T
0 3.5
1 3.5
1 3.5
1 0 0 0
3.51 0 0 0 0
3.51
3.51
)4(
B
T
0 1 1 1 0 0 0
1 0 0 0 0 1 1)4(
T
Finally, the fifth input vector, (1, 1, 0, 1, 1, 1, 0), gives us:
Once again, y2 is the winner.
The vigilance test fails this time:
This means that the active set A is reduced to contain only the first node, which becomes the uncontested winner.
ART Example Computation 33
5.3
3 ;
5.3
221 yy
.7.06.05
37
1
7
1 2,
l l
l ll
x
xt
The vigilance test fails for the first unit as well:
We thus have to create a third output neuron, which gives us the following new weight matrices:
ART Example Computation 34
.7.04.05
27
1
7
1 1,
l l
l ll
x
xt
T
B
0 5.5
1 5.5
1 5.5
1 0 5.5
1 5.5
1
0 3.5
1 3.5
1 3.5
1 0 0 0
3.51 0 0 0 0
3.51
3.51
)5(
T
0 1 1 1 0 1 1
0 1 1 1 0 0 0
1 0 0 0 0 1 1
)5(
T
In the second epoch, the first input vector, (1, 1, 0, 0, 0, 0, 1), gives us:
Here, y1 is the winner, and the vigilance test succeeds:
Since the current input is identical to the winner’s top-down weights, no weight update happens.
ART Example Computation 35
5.5
2 ;0 ;
5.3
3321 yyy
.7.013
37
1
7
1 1,
l l
l ll
x
xt
The second input vector, (0, 0, 1, 1, 1, 1, 0), results in:
Now y2 is the winner, and the vigilance test succeeds:
Again, because the current input is identical to the winner’s top-down weights, no weight update occurs.
ART Example Computation 36
5.5
3 ;
5.3
3 ;0 321 yyy
.7.013
37
1
7
1 2,
l l
l ll
x
xt
The third input vector, (1, 0, 1, 1, 1, 1, 0), give us:
Once again, y2 is the winner, but this time the vigilance test fails:
This means that the active set is reduced to A = {1, 3}.
Since y3 > y1, the third node is the new winner.
ART Example Computation 37
5.5
4 ;
5.3
3 ;
5.3
1321 yyy
.7.06.05
37
1
7
1 2,
l l
l ll
x
xt
The third node does satisfy the vigilance threshold:
This gives us the following updated weight matrices:
ART Example Computation 38
.7.08.05
47
1
7
1 3,
l l
l ll
x
xt
T
B
0 4.5
1 4.5
1 4.5
1 0 0 4.5
1
0 3.5
1 3.5
1 3.5
1 0 0 0
3.51 0 0 0 0
3.51
3.51
)8(
T
0 1 1 1 0 0 1
0 1 1 1 0 0 0
1 0 0 0 0 1 1
)8(
T
For the fourth vector, (0, 0, 0, 1, 1, 1, 0), the second node wins, passes the vigilance test, but no weight changes occur.
The fifth vector, (1, 1, 0, 1, 1, 1, 0), makes the second unit win, which fails the vigilance test.
The new winner is the third output neuron, which passes the vigilance test but does not lead to any weight modifications.
Further presentation of the five sample vectors do not lead to any weight changes; the network has thus stabilized.
ART Example Computation 39
Adaptive Resonance Theory40
Illustration of the categories (or clusters) in input space formed by ART networks.
increasing leads to narrower cones and not to wider ones as suggested by the figure.
• A problem with ART-1 is the need to determine the vigilance parameter for a given problem, which can be tricky.
• Furthermore, ART-1 always builds clusters of the same size, regardless of the distribution of samples in the input space.
• Nevertheless, ART is one of the most important and successful attempts at simulating incremental learning in biological systems.
Adaptive Resonance Theory 41
Applications of ART
• Face recognition
• Image compression
• Mobile robot control
• Target recognition
• Medical diagnosis
• Signature verification
Conclusion
ART is Artificial Neural Network system that must be able to adapt to
changing environment but constant change can make the system unstable
system because system may learn new information only by forgetting
everything it has so far learned.
44
Main References:
• S. Rajasekaran, G.A.V. Pai, “Neural Networks, Fuzzy Logic and Genetic Algorithms”, Prentice Hall of India,
Adaptive Resonance Theory, Chapter 5.
• Jacek M. Zurada, “Introduction to Artificial Neural Systems”, West Publishing Company, Matching & Self
organizing maps, Chapter 7.
• Carpenter, G. A., & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern
recognition machine. Computer vision, graphics, and image processing, 37(1), 54-115.
• Adaptive Resonance Theory, Soft computing lecture notes, http://www.myreaders.info/html/soft_computing.html
• Fausett, L. V. (1994). Fundamentals of neural networks: Architectures, algorithms, and applications. Englewood
Cliffs, NJ: Prentice-Hall.